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PREFACE 


This book has been prepared for a one-semester basic course in 
elementary statistical analysis which, at Princeton, is the introductory 
course for all fields of statistical application, and is usually taken in 
the freshman year. It is especially designed for those who intend to go in¬ 
to the biological and social sciences. It presupposes one semester of ele¬ 
mentary mathematical analysis covering topics such as those included in the 
first half of F. L. Griffin’s Introduction to Mathematical Analysis . The 
material has been de\'eloped from two years of experience with such a course. 

An effort has been made throughout the book to emphasize the role 
played in statistical analysis by a sample of measurements and a population 
from which the sample is supposed to have arisen. Only three chapters are de¬ 
voted to elementary descriptive statistics of a sample of measurements. In 
these three chapters the idea of a population is presented on a purely intui¬ 
tive basis. Probability concepts are then introduced. This makes it possible 
to use these basic concepts at an early stage in dealing more critically with 
the idea of a population and sampling from a population. Considerable atten¬ 
tion is given to the application of sampling principles to the simpler problems 
of statistical inference such as determining confidence limits of population 
means and difference of means, making elementary significance tests, testing 
for randomness, etc. No attempt has been made here (in fact, there is not 
enough time in one semester.’) to go into analysis of variance and more sophis¬ 
ticated problems of statistical inference. An elementary treatment of analysis 
of pairs of measurements including least squares methods is presented. Special 
effort has been made throughout the book to keep the mathematics elementary and 
to state specifically at which points the mathemiatics is too advanced to pre¬ 
sent , 


The course in vfhich this material has been used has been conducted 
satisfactorily (not ideally]) without the use of a computing laboratory,, The 
problems in the exercises have been selected so that computations can be car¬ 
ried out effectively by the use of a small handbook of tables such as 
C. D. Hodgman’s Mathematical Tables from Handbook of Chemistry and Physics . 

The author would like to express his appreciation to; Professor 
R. A. Fisher and Messrs. Oliver and Boyd for permission to use the material in 
Table 10.2; to Professor E. S. Pearson and the Biometrika Office for permission 
to reprint Figures 10,1 and 13.6; to Dr. C. Eisenhart and Miss Freda S. Swed 
for permission to use the material in Table 12,3; and to the College Entrance 
Examination Board for permission to reprint Figure 13.5, 

Finally, the author takes this opportunity to acknowledge the bene¬ 
fit of many helpful discussions he has had with his colleagues, Professors 
A. W. Tucker and J. W. Tukey, and Professor F. Hosteller of Harvard University 
during the preparation of the material. He is also indebted to Drs. K. L. Chung, 
P. Otter and D. F. Votaw, who assisted with the preparation of Chapters 4, 6, 7 
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and 8; to ]VIr. J, G. C. Templeton^ who checked the computations and proofread 
the manuscript; to Mr. R. B. Murphy who drew the figures; and to Mrs, 

Frances Purvis who typed the manuscript. 

The material in this book is still in a tentative form. Any errors 
or v^eaknesses in presentation are solely the responsibility of the author. Cor¬ 
rections ^ • critici sms , and expressions of other points of view on the teaching 
of such a course will be gratefully received. 


S. S. Wilks 


Princeton^, hew Jersey 

September 

1948 
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CHAPTER lo INTRODTJCTIOW 


1,1 General Remarks , 

To many persons the word statistics means neatly arranged tables of 
figures and bar charts printed in financial sections of newspapers or issued by 
almanac publishers and government agencies. They have the impression that sta¬ 
tistics are figures used by persons called statisticians to prove or disprove 
something. There is plenty of ground for this impression. Anyone who tries to 
make sense out of a set of observational or experimental data is assuming the 
role of a statistician, no matter whether he is a business executive, a medical 
research man, a biologist, a public opinion poller or an economist. Some sets 
of data are very simple and the implications and conclusions inherent in the 
data are obvious. Other data, however, are complex and may trick and confuse 
the statistical novice, even though he may be an expert in the subject matter 
field from which the data came. The only way to reduce this confusion is through 
scientific methods of collecting, analyzing and interpreting data. Such methods 
have been developed and are available. The fact that expert statisticians well- 
versed in these methods can and do come out with sound conclusions from a given 
set of data which differ very little from one statistician to another is evidence 
that there are no real grounds for the naive claim that statistics can prove 
anything. Some of the most dangerously deceptive uses of statistics occur in 
situations where correct conclusions are drawn and which seem to depend on the 
statistics, when in fact, the statistics have little if anything to do with the 
original question. While mathematics cannot protect a person from this danger 
directly, familiarity with numerical analysis will make it easier to spot such 
hidden fallacies. 

Modern statistical method is a science in itself, dealing with such 
questions as; How shall a program of obtaining data be planhed so that reliable 
conclusions can be made from the data? How shall the data be analyzed? What 
conclusions are we entitled to draw from the data? How reliable are the con¬ 
clusions? To try to present all the statistical methods that are known and 
used at present would be an encyclopaedic venture which would lead us deeply 
into statistical theory and many subject matter fields. However, there is a 


1 


2 


1 . INTRODUCTION 


Sec. 1.2 


body of fundamental concepts and elementary methods which can be presented in 
a beginning course. The purpose of this course is to do just this, and to il¬ 
lustrate the concepts and methods on simple examples and problems from various 
fields. 

You will ask at this^juncture what kinds of situations come up which 
involve these fundamental concepts and elementary methods. Series or sets of 
raw statistical observations or measurements arise in many ways and in many 
different fields. The number of observations or measurements needed or feasible 
varies tremendously from one situation to another. In some cases, as in peace¬ 
time firing of large caliber naval guns, only a very small sample of measurements 
(of where the shells actually fall) can be obtained because of cost. In other 
cases, as for example in a Gallup poll for a presidential election, the number 
of observations runs into the thousands. In some situations the sample comprises 
the entire population of measurements or observations which could be made, par¬ 
ticularly in census-type work where complete enumerations of populations are 
made. Federal and state government agencies and national associations compile 
data on entire populations of objects, e.g., births, deaths, automobile registra¬ 
tions, number of life insurance policies, etc., etc. 

There are two general types of statistical observations: (l) quanti¬ 
tative and (2) qualitative . We shall discuss these separately, 

1.2 ^Quantitative Statistical Observations . 

By quantitative statistical observations we mean a sequence or set of 
numerical measurements or observations made on some or all of the objects in a 
specified population of objects. If the observations are made on some of the 
objects we call the set of observations a sample . Let us illustrate by some 
examples. 

Suppose a men's clothing store proprietor writes down from sales slips 
the sizes of men's overcoats sold every other week for September and October. He 
would end up with a list of numbers that might run something like this: 56, 42, 
44, 30, 40, 36, ,,, and so on for 145 numbers. The list of numbers written down 
constitutes a sample of sizes from the population of overcoats he has sold dur¬ 
ing September and October. 

By making an analysis of a series of specimens from a certain deposit 
of ore, for percent of iron, a chemist might turn up with 1 meas'arement on each 
of five specimens something like this: 28.2, 27.6, 29.3, 28.2, 30.1. This is 


Sec. 1,2 


1. INTRODUCTION 


3 


a sample of iron percentage measurements from five specimens out of an extremely 
large population of possible specimens from the deposit, 

A record-keeping bridge player might keep track of the number of honor 

cards he gets in 200 bridge hands finding some such sequence as: 9, 5, 7, 2^ 4, 

8, 0, 3 and so on for 200 numbers. He would therefore accumulate honor card counts 

in a sample of 200 hands out of a population of indefinitely many hands which could 

be conceivably dealt under a given shuffling and cutting practice, 

A quality control inspector interested in maintaining control of the 
inner diameter of bushings turned out by an automatic lathe would pick a bushing 
every 30 minutes and measure its inner diameter, obtaining some such sequence as 
this: 1.001", ,998", ,999", 1.001", 1,002", etc. He is selecting a sample of 
bushings out of the population of bushings being manufactured by this lathe, 

A Princeton personnel researcher goes through the record cards of all 
246 freshmen who took Mathematics 103 and writes down two numbers for each student; 
his College Board mathematics score and his final group in Mathematics 103. His 
sequence of pairs of nxmibers (arranged alphabetically with respect to students’ 
names) might run like this (680, 3+), (740, 1-), (530, 5), (620, 3), (510, 6) and 
so on for 246 pairs of scores. In this case the sample would consist of all of 
the freshmen in the population of freshmen who took Mathematics 103. 

Notice that in this last example each quantity in the sequence consists 
of two measurements. We could mention many examples in which the sequence would 
contain not only pairs, but sets of three, four or more measurements. 

We could continue with dozens of such examples. It is to be noted that 
in every example mentioned, the series of statistical measurements may be regard¬ 
ed as a sample of measurements from a population of measurements. In general, 
there are two kinds of populations: finite populations and indefinitely large 
populations . For example, the undergraduates now enrolled at Princeton constitute 
a finite population. The licensed hunters of Pennsylvania fo^nn a finite popula¬ 
tion, The sequence of numbers of dots obtained by rolling a pair of dice indef¬ 
initely many times is an indefinitely large population. In the case of the dice, 
the indefinitely large population consisting of the sequence of dots is generated 
by successively rolling the dice indefinitely many times. The population in this 
case depends on various factors, such as the dice themselves (which may be slight¬ 
ly biased), the method of throwing them and the surface on which they are thrown. 

If the dice were "perfectly true", and if they were thoroughly shaken before 
throwing and if they were thrown on a "perfect" table top, we can imagine having 
an "ideal" population. We can use probability theory (to be discussed in later 
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sections) to predict characteristics of this "ideal" population, such as the frac¬ 
tion of rolls of two dice in which 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 or 12 dots will 
occur "in the long run", the fraction of sets of three rolls of two dice in which 
6, 2, 11 dots are obtained in that order, and so on. In the case of the lathe 
turning out bushings, we may essentially consider the population to be indefinite¬ 
ly large, since the population is being generated by the production of one bushing 
after another, with no consideration of a "last bushing" (which, as a matter of 
fact, sooner or later will be made). But the important thing about this popula¬ 
tion of bushings is that it actually changes because of tool wear or changes in 
raw material from which bushings are made or change of operators, etc. For any 
particular shift of operators the population may be fairly constant and a sample 
of inner diameters of bushings taken during that shift may be considered as a 
sample from an indefinitely large potential population of bushings that might be 
turned.out under the particular conditions of that shift . Even in the case of a 
finite population of objects, a given sampling procedure might be such that when 
applied to a relatively small number of objects in the population it essentially 
begins to generate a population different from the finite population one thinks 
he is sampling. For example, if one should take every 20th residence listed in 
the Princeton telephone directory and call the number for information about that 
residence, one has, on the face of it, a sampling procedure which might be ex¬ 
pected to yield information from which one could make accurate inferences about 
the population of Princeton residences with telephones. Actually, there will be 
a substantial number of residences for which there will be no response. If we 
take the sample of residences in which a response is obtained, our sanpling pro¬ 
cedure is not sampling the population of residences with telephones — it is 
sampling the population of residences with telephones in which telephones are 
answered. These two populations of residences are actually different. For 
example, the second tends to have larger families and more old people and other 
stay-at-home types of people in them. Of course, if we make enough repeated 
telephone calls to the residences who did not answer the telephone originally, 
we would then be sampling the first population, 

Whi-at is supposed to be done with sanples of measurements? The main 
reason for keeping track of such measurements is not simply to accumulate a lot 
of numbers, but, in general, to try to learn something about the main features 
of the set of numbers — their average, how much they vary from o.ne another, 
etc,, — for the purpose of making inferences about the popuJatioxi from which 








See. 1.2 


1, INTRODUCTION 


5 


they can be considered as having been ’’drawn"» None of these measurement- 
makers want to get any more data than necessary to make these inferences. 

Once he has what he thinks is a pretty sound inference as to what the popula¬ 
tion is (that is, a ^’reasonably" accurate description of it from the sample) 
he can then begin to consider what ought to be done (perhaps nothing) to change 
it in some direction or other which will be to his advantage, or more often, 
to use this information elsewhere. 

The clothing store proprietor can find out from a sample whether he 
is stocking the right distribution of sizes of overcoats; the chemist (or rather 
his boss) can use the results of his sample of analyses to help decide whether 
the iron ore is worth mining; the bridge player can satisfy his curiosity as 
to how frequently various numbers of honor cards are obtained (since he presum¬ 
ably does not want to try to figure these things out mathematically on the 
assumption of perfect shuffling); the quality control expert can see whether 
the inner diameters of his bushings are being kept within the specified toler¬ 
ances and if not whether the holes are being made too large or too small and 
by how much; the personnel researcher can determine how high the relationship 
or correlation is between the College Board mathematics test and the final group 
in Mathematics 103 and whether it is high enough to make useful predictions as 
to how well each entering freshman can be expected to do on Mathematics 103 
from a knowledge of his College Board mathematics score. 

Evidently, condensing the sample data in some way is vital in any one 
of these problems. The first thing that has to be learned in statistics is how 
to condense the sample data and present it satisfactorily. The main thing that 
has to be learned is what kind of inferences or statements can be made from the 
sample about the population sampled and how reliable these inferences are. The 
simplest thing that can be done in condensing and describing samples of quanti¬ 
tative data is to make frequency distributions and describe them by calculating 
certain kinds of averages. Such quantities calculated from samples for describ¬ 
ing samples are called statistics , Similarly, populations are described by 
population parameters . 

Only rarely is it possible to know precisely the values of population 
parameters, simply because only rarely does one ever have the data for the entire 
population. The usual situation is that one only has a sample from the popula¬ 
tion, Hence the usual problem is to calculate statistics from the sample frequen¬ 
cy distribution and then try to figure out from the values of these statistics 
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what the values of the parameters of the population are likely to be. In case 
of extremely large samples, the statistics of properly drawn samples will have 
values very close to those of the corresponding population parameters. For 
example, the average of a very large sample of measurements "randomly drawn" 
from a population will be very close to the average of the entire population 
of measurements. But in the case of small samples the discrepancies become 
larger, and the problem of inferring the values of population parameters from 
sample statistics becomes more complicated and has to be settled by means of 
probability theory. 

There is a source of information in sequences of observations which 
is particularly useful in such fields as analysis of data from scientific exper¬ 
iments and industrial research, development and production. This is the in¬ 
formation contained in the way in which measurements jump about from value to 
value as one goes through the sequence of sample measurements in the order in 
which they are made. The usual frequency distribution analysis (to be studied 
in Chapters 2 and 3) does not take account of this information. But we shall 

discuss this kind of sequence analysis in Chapter 12. 

1,3 Qualitative statistical observations . 

By qualitative statistical observations we mean a sequence of obser¬ 
vations in which each observation in the sample (as well as the population ) 
belongs to one of several mutually exclusive classes which are likely to be 
non-numerical, Let us consider some examples, 

A person tosses a coin 50 times and obtains some such sequence as 
H, H, T, T, H, T and so on (H=heads, T=tails), He is essentially drawing a 
sample of 50 tosses out of an indefinitely large population of tosses, and is 

making an observation on each toss as to whether it is an H or a T, 

A movie producer polling agent stationed at the exit of the Princeton 
Playhouse asking outgoing moviegoers whether or not they liked Movie X (just 
seen), might get a sequence of 100 answers starting off liKie this; Yes, Yes, 

No, Yes, No, Yes, Yes and so on, (He probably wouldn^t stop at this simple 
question, however, since he would probably at least want to know why he or she 
liked it or not,) The answers here are qualitative; they are either y£S_ or 
no . The data accumulated are responses from a sample of moviegoers out of the 
population of moviegoers who saw Movie X at the Playhouse. 
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A Washington traffic analyst interested in out-of-DC cars coming 
into Washington during July 1948, might place traffic-counting clerks for 
three one-hour periods on each odd-numbered day in July at each of the major 
highway entrances into Washington to check license plates and record the 
state (ignoring Virginia and Maryland perhaps) for each car. The record for 
each clerk would be a sequence of state names or initials (or tally marks 
on a list of states). These clerks are drawing samples of out-of-DC cars out 
of the population of out-of-DC cars going into Washington during July, 1948, 

As in the case of quantitative statistical data, we can have samples 
of observations, each of which consists of pairs, triplets, or any number of 
qualitative observations. For example, if a public opinion questionnaire with 
ten opinion questions with yes , no or no-opinion as possible answers on each 
question were submitted to each of 100 people, the results would be 100 sets, 
with ten observations in each set. 

For qualitative observations the problems of how much data to gather 
are similar to those for quantitative observations. However, the problems of 
condensing the data and making analyses of them are, in general, simpler than 
those for quantitative data. These problems of condensing the data in a 
sample of qualitative observations is one of counting frequencies and comput¬ 
ing percentages with which observations fall into the various mutually 
exclusive classes. For example in a public opinion poll, the analysis of the 
results on a particular question amounts to counting the number of answers 
in the ’’yes", "no" and "no-opinion" classes and calculating the percentage 
in each class, (See any one of the Gallup Poll newspaper releases,) (There 
are, of course, other problems of cross-tabulation of analyses, such as 
finding the percentage of those answering "no" to question B who answered 
"yes" on question A, and so on,) If one has very large "properly draivn" 
samples of qualitative observations, the analysis essentially stops with count¬ 
ing and percentage analysis, and perhaps in presenting them graphically. In 
large "properly drawn" samples, the percentages calculated from the sample 
(the sample statistics) will be approximately the same as the percentages for 
the entire population (the population parameters) as one could check if one 
had the population available. But if the samples are small, one is then faced 
with the problem — just as in the case of small samples of quantitative data 
— of worrying about the accuracy with which one can estimate the population 
percentages by using the sample percentages. 
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Everyone is familiar with graphical presentations of percentages or 
frequencies calculated from qualitative data. Examples can be found in magazine 
newspapers, posters, folders, information booklets, etc. There are two or three 
basic types of graphs or charts, of which there are many colored and pictorial 
variations. The first is the bar chart with horizontal or vertical bars, of 
which the following are examples; 

Motor-Vehicle Traffic Fatalities (in 10QQ>s) in the U.S. in 1945 
(From Statistical Abstract of the U.S., 1944-45) 

Accident involving: 0123456789 10 

Pedestrians 
Other Motor Vehicles 
Running off Highway 
Fixed Object 
Railroad Train 
Other Categories 



Billions 

of 

Dollars 


Ordinary Life Insurance Death Benefits in the U.S. in 1946 


(From The Institute of Life Insurance 


1.0 

.8 
.6 
.4 
.2 

0 

1941 1942 1943 1944 1945 



Figure 1,1 

In many popular presentations these monotonous black bars are replaced 
by rows of men, autos, piles of dollars, or other symbols suggesting the subject 
matter which is being described. Often bar charts are used for presenting per¬ 
centages or totals for a series of years — one bar being used for each year. 
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There are many elaborations of bar charts from which one can make easy compari¬ 
sons of numbers in each of several categories for two or more years. For example 
if one had the percentages of death by various major causes for New Jersey for 
1946 and 1947, one could construct a bar chart in which there is a pair of bars 
side by side for each cause of death, one bar for 1946 and another for 1947. 

The second type of chart is the familiar pie chart which is particular¬ 
ly useful in showing how the total or 100^ of anything is divided up into certain 
classes. For example, here is a typical pie chart (without the coloring or 
cross-hatching ): 



How the 1945 life insurance dividends (dollars) were used by policyhold e rs in 1945 
(Institute of Life Insurance Data) 


Figure 1.2 
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Exercise 1, 


(.These questions are listed mainly to provoke discussion - oral and iwritten). 

1. If, in an opinion survey, you were asked to select a sample of 100 Princeton 
undergraduates who are sons of Princeton alumni, how would you select the sample? 
What is the population sampled? If these 100 undergraduates are mailed a ques¬ 
tionnaire and only 60 undergraduates fill them out and return them, what popula¬ 
tion would be sampled? 

2. Suppose you were asked to undertake a study of electric current consumption 
in private homes of Princeton during December, 1947, on the basis of a sample of 
300 households. How would you proceed with the selection of the sample of 
addresses for such purpose? What is the population sampled? 

3. In investigating an allegedly biased six-sided die, suggest a procedure for 
getting a sample of numbers you would need. What would be the population for 
this die? 

4, In studying the burning life of a new type of 40-watt bulb being made in 
small quantities for experimental purposes, a suitable sample of measurements 
would consist of what? What is the population in such a study? 

5, Consider the washers being made by an automatic machine for a certain kind 
of precision instrument. What is the population of washers? How would you 
select a sample of washers? 

6. Indicate how you would undertake to get a sample of 200 sentence lengths in 
studying sentence length used by Margaret Mitchell in Gone With the Wind . What 
is the population here? 

7, Describe briefly how a radio audience researcher for Station WOR, investigat¬ 
ing the amo’unt of WOR day-time listening in Trenton, might select a sample of 
500 homes from telephone subscribers in Trenton. Tfhat is the population in this 
example? If the investigator calls these 500 homes by telephone and gets a 
response from only 400 of them, what population is actually being sampled? 

8, In studying the tensile (,breakingj strength of 12-gauge aluminum fence wire 
being turned out continuously at a factory and cut into 1000-foot lengths (plus 




Sec. 1.3 


1. imODiJCTI ON 


11 


a few feet) for coiling, suggest a practical procedure for getting a sample of 
jneasurements which you could use. lAlhat would be the population? 

9. The Fish and Game Commission of a certain state wants a sample of 1000 of its 
population of licensed hunters to fill out a post card questionnaire. It has 
dozens of books of license stubs from agents all over the state giving names and 
addresses of the hunters. I’Vhat would you consider to be a satisfactory method 

of drawing a sample of names of licensed hunters? If only 750 of the hunters re¬ 
turned the filled-out post cards, what population would be sampled? How do you 
think this population would compare with the entire population of licensed hunters 
of the state? 

10. A small radio transmitter is designed so it may be used to generate a certain 
automatic signal. In making a detailed study of the length of this signal and 
how it varies for a single specified unit, how would you draw a sample of signals 
from a given unit? What is the population in this case? 

11. Suppose you had 2500 ballots filled out in a' public opinion poll on a sample 
of 2500 voters in City X. In the ballot there is an item consisting of a list of 
6 potential presidential candidates, and each respondent is asked to check one 
name among them whom he would like to see as president. How would you condense 
the results on this item for the 2500 ballots and present the results? Suppose 
in a second item the respondent checks whether he is a Republican, Democrat or 
Other, How would you present the presidential choice data so as to sho’w how it 
varies with political affiliation? What is the population in this example? 

12. If you are asked to find out from a sample of cars the extent to which Prince 
ton oar owners use the various brands of automobile tires, how would you proceed 
to collect the data? How would you condense it and exhibit it when you got it? 
What is the population of tires for the procedure you would use? 

13. The percentage of life insurance policies sold by United States companies ‘in 
1944 in each of the following categories; Whole life. Limited Payment life. Endow 
ment, All Other, were 22, 36, 21, 21, respectively. The percentages for 1942 were 
26, 36, 16, 22. Present this material graphically so it is easy to make compari¬ 
sons within each category for the two years. In this example, what is the 
population? Ydiat is the sample? 
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14. If a family should keep an accurate record of the disposition of its incomes 
for a year, and should find the number of dollars for each of the categories; 
food, clothing, rent, entertainment, savings, all other, how would you present 
the data graphically? If it were collected for three successive years, how would 
you present it graphically, so as to make easy year-to-year comparisons within 
each category? 





CHA.PTER 2, FREQUENCY DISTRIBUTIONS 


^1 Frequency Distributions for Unp;rou p ed Measurements — an example . 

Raw statistical data as pointed out in Section l.kl, usually consists 
of a series of readings or measurements. As an example, we shall take the 
weights ito the nearest ,01 ounce) of zinc coating of 75 galvanized iron sheets 
of a given size as given in Table 2,1 (from A.S,T,M . Manual on Presentation of 
Data , American Society for Testing Materials, 1947, p, 4)i 

TABLE 2,1 


Weights (in ounces) 


1,47 

1.60 

1.62 

1,60 

1.52 

1.38 

1.77 

1.73 

1.55 

1,70 

1.53 

1,60 

1.38 

1.60 

1.37 

1.48 

1,64 

1.51 

1.46 

1.53 

1.63 

1.59 

1.54 

1,50 

1.53 

1.60 

1.34 

1.54 

1,60 

1,57 


Zinc Coatings of 75 


1.58 

1.58 
1.32 
1.62 

1.47 
1.42 
1.45 
1.34 
1.44 
1.66 

1.48 

1.48 
1,55 
1,64 
1,57 


Galvanized Iron Sheets 


1.56 

1,44 

1.39 

1.35 

1.65 

1.53 

1.62 

1.38 

1,53 

1.46 

1.47 

1.44 

1.34 

1.47 

1.58 

1.43 

1.49 

1.64 

1.56 

1.50 

1.54 

1.61 

1.57 

1.42 

1,67 

1.57 

1.47 

1.75 

1,63 

1.47 


These 75 zinc coating weights were measured on a sample of small iron 

sheets of the same size by a chemical technique. The 75 measurements were a 
sample of chemical determinations from a (theoretically) indefinitely large 
population of chemical determinations which might have been made from galvanized 
iron sheets at that time. Just by looking at the 75 measurements themselves, 
one cannot tell whether the variation from 1,32 ounces to 1,77 ounces is due 
mainly to variations in the weights of zinc actually deposited on the iron sheets 
or to variations in chemical technique, or both. This question would have to be 
settled by an elaborate experiment. This kind of question always arises in 
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connection with measurements, although our common knowledge can sometimes answer 
it for us. But we shall proceed as though the variations in the measurements are 
due mainly to variations in the weights of the actual coatings. 

The 75 measurements in Table 2,1 may be considered therefore as a 
sample from an indefinitely large population of measurements that might have been 
taken. Thus, if we had taken a further sample of 75 sheets we would have obtained 
another set of 75 numbers, and so on for any number of samples of 75 we can 
imagine as having been taken. We will consider this sampling problem later in 
the course. Our .job at present is to describe the information furnished by the 
sample of 75 numbers in Table 2,1 , 

A simple graphical representation of these 75 numbers is given by the 
dot frequency diagram shown in Figure 2,1, in which each dot represents an ob¬ 
servation, The graphical display in Figure 2.1, although giving a quick picture 
of the data and showing how it tends to "bunch up" in the middle, is ordinarily 
not as useful for descriptive purposes as a cumulative graph as shown in Figure 
2.2, which can be readily plotted from the information shown in a dot frequency 
diagram. In Figure 2.2, the ordinate of the "step-like" cumulative graph for 
any given abscissa gives the frequency (or percent } of iron sheets having zinc 
coating weight less than or equal to that particular abscissa. The left-hand 
scale of ordinates gives cumulative frequency and the right-hand scale gives 
cumulative percent . As an example, we note that the ordinate at the abscissa 
1.58 is 53, as read on the frequency scale, or 70,7, as read on the percent scale. 
This means that there are 53 iron sheets (or 70,7%) having zinc coat weights less 
than or equal to 1,58 ounces. 

Note that the ordinate for any given abscissa at which a jump occurs, 
is to be extended to the top of the jump. For example, the ordinate correspond¬ 
ing to 1,58 is 53 and not 50, Conversely, one may find the abscissa correspond¬ 

ing to any given ordinate (read from either of the two scales). Strictly 
speaking, the only values of the ordinates at which abscissas are actually defined 
are those at which horizontal "steps" occur, and then the abscissa for that 
ordinate is the abscissa corresponding to the left-hand end of the "step". For 
example, the ordinate 10 (on the cumulative frequency scale) corresponds to the 
abscissa 1,39 (and not 1.42). However, if we take any value p on the percent 
scale then draw a horizontal line to the right until we strike the step-like 

graph (either the vertical dotted portions of the graph or the left-hand end of 

a "step") and then draw a straight line vertically downward until we strike the 
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Dot Frequency Diagram of the Measiireineiits in Table 2,1 
(The Point Marked X = 1.^527 is the Mean of the Distribution 
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horizontal axis, the point of intersection on the horizontal axis is called the 
p~th percentile . This means that approximately p percent of the sample raeasure- 
ments are less than or equal to the value of the p-th percentiles. For example, 
the 90th‘percentile is 1.64. The actual number of cases less than or equal to 
1,64 is 69 (or 92^), while the actual number less than 1,64 is 66 (or 88/^), The 
desired percentage (90^) falls between these two percentages. The 50th percen¬ 
tile is called the median, and in this case is 1,53. Actually, the number of 
measurements less than or equal to the median is 38 (or 60.7^), while the nimiber 
less than 1.55 is 33 (or 44^). 

The 25th percentile is called the lower quart!le and the 75th per¬ 
centile is called the upper quartile . The difference between these two quartiles 
is called the inter-quartile range, which includes approximately 50^^ of the 
sample measurements. The lower and upper quartiles in Figure 2,2 are 1,46 and 
1,60, respectively. The inter-quartile range is therefore 1,60 - 1,46 * .14, 

The difference between the least and greatest measurements in the sam¬ 
ple is called the range of the sample. In the case of the data in Table 2,1 and 
Figure 2.1, the range is 1.77 - 1.32 = .45. 

Exercise 2,1 , 

(In these problems use graph paper ruled with 10 divisions per inch). 

1, A class of twenty students made the following grades on a mid-term tests 
30, 26, 31, 20, 33, 40, 7, 36, 28, 16, 18, 24, 22, 21, 28, 22, 25, 46, 29, 27. 
Make a dot frequency diagram and a cumulative graph of these grades. Determine 
the range, upper and lower quartiles, inter-quartile range and median. Indicate 
the quartiles and median on the cumulative graph, 

2, The following table gives the Tickers Hardness numbers of 20 shell cases 
(Pedlar Data;: 


66.3 

61,3 

62,7 

60,4 

60.2 

64.5 

66,5 

62,9 

61,5 

67.8 

65,0 

62,7 

62,2 

64,8 

65.8 

62,2 

67,5 

67.5 

60.9 

63.8 


Make a dot frequency diagram and a cumulative graph of these nimibers. Determine 
the range, upper and lower quartiles, inter-quartile range and median. Indicate 
the quartiles and median on the cumulative graph. 
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3, The number of words per sentence in 60 sentences taken from a certain section 
of Toynbee’s A Study of History were as follows* 


24 

44 

26 

39 

34 

39 

54 

28 

73 

96 

46 

80 

25 

26 

21 

22 

35 

7 

42 

34 

^1 

36 

17 

41 

55 

20 

23 

22 

11 

36 

48 

15 

27 

44 

16 

58 

21 

70 

50 

40 

39 

43 

42 

20 

35 

60 

18 

12 

69 

40 

28 

12 

15 

20 

43 

19 

19 

65 

41 

66 


Make a dot frequency diagram and a cumulative graph of these numbers. Determine 
the range, upper and lower quartiles, inter-quartile range and median. Indicate 
the quartiles and median on the cumulative graph, 

4, The following table (from Grant) gives the yield point (,iu units of 1000 
Ib/aq, inch) for each of 40 steel castings* 


64.5 

67.5 

67,5 

64.5 

66.5 

73.0 

68.0 

75.0 

68.5 

71,0 

67.0 

69.0 

68,0 

69,6 

72.0 

71,0 

69.5 

72.0 

71.0 

68.6 

66,5 

67.5 

69.0 

68.0 

65.0 

63,5 

65.5 

65.0 

70.0 

68.5 

68,5 

70.5 

64.5 

67,0 

66,0 

63,5 

62.0 

70.0 

71,0 

68,6 


Make a dot frequency diagram and a cumulative graph of these numbers. Determine 
the range, upper and lower quartiles, inter-quartile range and median. Indicate 
the quartiles and median on the cumulative graph. 

6, Throw 5 dice 40 times and record the total number of points on each throw, 
Me^ke a dot frequency diagram of the results, and also a cumulative graph. Also 
determine the range, median, lower and upper quartiles, and the inter-quartile 
range, (For the purposes of this problem you can consider 5 throv;rs of one die 
equivalent to one throw of five dice in case you do not have five dice I) 

6, Throw tbn pennies 50 times and record the number of heads each time. Make 
8 dot frequency diagram and a cumulative graph of the results. Also determine 
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the range, the median, the lower and upper quartiles and the inter-quartile range, 

7, Shuffle a pack of cards thoroughly and deal off a hand of 15 cards. Record 
the number of honor cards. Return the cards to the pack and repeat. Do this 40 
times. Make a dot frequency diagram and a cumulative graph. Also determine the 
range, median, lower and upper quartiles and the inter-quartile range. 


^^ Frequency Distri bntionsfor Grouped Measurements — an examp 1e, 

If there are more than about 25 observations in the sample of data, the 
construction of dot frequency diagrams or cumulative graphs for ungrouped data 
often involves more detail than is usually needed for practical purposes. One 
does not distort the pertinent numerical information provided by the data to 
amount to anything from a practical point of view if the data are grouped. In 
this case one can use grouped frequency distributions , and cumulative grouped 
frequency distributions . To define these grouped distributions we first construct 
^ frequency table . Returning to Table 2.1 (or looking at Figure 2,1) we find the 
least value in the table to be 1.32 and the largest value to be 1.77. The range 
is ,45. We now divide the range into a number of equal intervals of convenient 
length. This means that the length should be a ’’round number”. The number of 
intervals is usually taken to be between 10 and 25. A convenient interval for 
our example is 0,05, which gives us 10 class intervals or cells . We might also 
have used 0.04, 0,03, or 0.02, but we would usually avoid 0*0333, 0.036, and 
other such inconvenient numbers. 

We now take the cells to be 1,275 - 1,325, 1.325 - 1,375, 1,376 « 

1,425, and so on to 1,725 - 1,775, Note the following features of these cellss 
(a) each is of length 0,05, (b) the boundaries of each cell end in a 6 and are 
written with one more decimal than is used in the original data of Table 2,1, 

(o) the upper boundary of any cell is the same as the lower boundary of the suc¬ 
ceeding cell (this is a convenience and will cause no ambiguity since the boun¬ 
daries are written to one more decimal than is used in the original data in 
Table 2.l). 

The cells are constructed so they will have the following simple mid¬ 
points respectively: 1.30, 1,35, 1,40, and so on to 1,75. (We can have all 
these nice properties since we took a round number for the cell length.) 
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The frequency table can be exhibited as Table 2.2. The cell boundaries 
are shown in column (a), the midpoints in column (b). The tallied frequencies 
with which the observations fall into the various cells are shoYm in column (c). 
The frequencies are shown in column (d). The relative frequencies in column (e), 
the cumulative frequencies in coliimn (f), and the cumulative relative frequencies 
in column (g). The last two columns are associated with the upper cell boundar¬ 
ies, and their entries are sometimes dropped a half line in the table. 

Wbat we are really doing in this grouping procedure is to arbitrarily 
assign the one measurement in Table 2.1 falling in the cell 1.275 - 1.325 the 
value 1.30, arbitrarily assign the five measurements in Table 2,1 falling in the 
cell 1.325 - 1.375 the value 1.35, and so on. 


TABLE 2.2 



Distribution for Grouped Measurements o f 
Weights of 75 Zin c Coating s 


(a) 

(b) 

(o) 

(d) 

(a) 

(f) 

(g) 







Cumulative 

Cell 

Cell 

Tallied 


Relative 

Cumulative 

Relative 

Boundaries 

Midpoints 

Frequency 

Frequency 

Frequency 

Frequency 

Frequency 

1.275-1.325 

1,30 

1 

1 

.013 

1 

.013 

1.325-1.375 

1.35 

14^ 

5 

.067 

6 

.080 

1.375-1,425 

1.40 

1 

6 

.080 

12 

.160 

1.425-1.475 

1.45 

Ill 

13 

.173 

25 

.333 

1,475-1,525 ' 

1.50 

111 

8 

.107 

33 

.440 

1,525-1.575 1 

1,55 

1:441 1441 11 

17 

.227 

50 

.667 

M.575-1,625 

1,60 

14111444 nil 

14 

,187 

64 

.854 

1.625-1.675 

1.65 

1441 11 

7 

.093 

71 

.947 

1.675-1.725^ 

1.70 

1 

1 

.013 

72 

,960 

1.725-1.775 

1,75 

111 

3 

.040 

75 

1.000 


The frequencies [columns (d) and (e)] in Table 2.2 can be represented 
graphically as a frequency histogram as shown in Figure 2.3. Note that two scales 
are provided for the ordinates — one scale refers to frequency the other to rela¬ 
tive frequency expressed in terms of percent. 

A more useful graphical representation of the material in Table 2.2 is 
given by a cumulative polygon for grouped data as sho'wn by the heavy graph in 
Figure 2.4. This graph together with the two scales of ordinates is the graphical 
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representation of columns (f) and (g) of Table 2.2, In this graph the points are 
plotted above the upper cell boundaries and not above the cell midpoints. 

The cumulative polygon provides a simple and quick graphical procedure 
for approximately determining percentiles without going to the degree of detail 
involved in using the cumulative graph in the ungrouped case (Figure 2.2), For 
example, the 90th percentile is 1,65 ounces, (See dotted lines in Figure 2,4.) 
This means that approximately 90 percent of the observations have values less 
than or equal to 1,65, (The actual number of such observations less than or equal 
to 1,65 ounces is 70 or 93,3^, as will be seen from Table 2.1 and Figure 2.1. 

The actual number less than 1,65 is 69 (or 92^), so that our grouped percentile 
does not necessarily have the bracketing characteristic as in the ungrouped per¬ 
centile. In other words, 90^ does not lie between 92^ and 93.3^,) In general, 
the larger the number of observations, and the smaller the cells, the more accur¬ 
ate these percentile approximations will be. 

It will be seen from Figure 2,4 that the median is about 1.54, the 
upper quartile 1,597 and the lower quartile 1,450, Note that the values of these 
quartiles as determined from the graph in Figure 2.4 are slightly different from 
the values found from the graph in Figure 2,1, This is due to the effect of 
grouping. 

We have been talking about grouped and ungrouped data — yet the origi¬ 
nal data itself could be considered as grouped with cell length equal to ,01 if 
the original measurements in Table 2,1 had been given to three or more decimal 
places instead of two. The question of deciding how many figures or decimals to 
keep in a set of measurements arises in most measurement problems, and has to be 
.settled in each case. In the present problem, it may be considered doubtful 
whether the zino coating measurements would really have any significance if car¬ 
ried to three or more decimal places. Or even if they did have significance there 
is such a wide variation of weights from one iron sheet to another that it may 
be considered as not worthwhile to have weights measured more accurately than to 
two decimals. 

If, therefore, we should consider the two-decimal measurements as 
grouped from measurements to three or more decimals we could construct a cumula¬ 
tive polygon for cell length of .01 just as we have done for cell length of ,05 
(Figure 2,4), Actually, the frequency polygon for a grouping with cell length 
of ,01, the cells being centered at 1,32, 1.33, 1.34, and so on to 1,77, could 
be constructed from the cumulative graph in Figure 2,2 as follows; Take the 
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midpoint of each ’’unit part” (.01 inch) on each horizontal portion of the 
step-like graph in Figure 2.2. The first point will be on the axis of abscissas 
at lo315, and the last one will be at the top (lOO^ point) of the ordinate 
erected for the abscissa lo775. This cumulative polygon would be simply obtained 
then from the cumulative graph of Figure 2,2 by trimming off the corners. The 
general course of the cumulative polygon constructed from the cumulative graph 
in Figure 2,2 and that of Figure 2,4, as one would expect, is similar except that 
the cumulative polygon in Figure 2.4 is ’’smoother”, 

vVe can determine from a cumulative polygon approximately the mmiber 
of cases in the sample lying between two values of the abscissa. For example, 
suppose we are interested in the number of cases in the sample having zinc coat¬ 
ing weight betv/een 1.42 and 1,68 ounces, 1,68 is the 94.8 percentile, and 1.42 
is the 14,7 percentile, 94.8 - 14,7 = 80.1;?^, This difference is approximately 
the percentage of cases having zinc coatings between 1.42 and 1,68 ounces. The 
actual number of cases is 59 (or 78.7^), 


Exercise 2,2 . 

(Each student is expected to do at least one of problems No, 8, 9, 

10, 11, 12, 13, 14 and to keep a record of the data in the order 
in which it was obtained — also a record of all calculations. 

The record will be needed in future problems.) 

1, Each of 300 measurements is given in inches to three decimal' places. The 
largest measurement is 2.062” and the smallest is 1.997”, Make up an outline 
of a frequency table showing cell boundaries and cell midpoints you would use. 

2, Measurements on the crushing strength of 270 bricks (in pounds per square 
inch), are given to the nearest 10 pounds. The largest measurement is 2070 
and the smallest is 270. Set up an outline of a frequency table showing cell 
boundaries and cell midpoints. 

3, Suppose the cell midpoints for a given frequency distribution are 110, 115, 
120, 125, 130, 135, 140, 145, 150, 155, 160, Make up an outline of a frequency 
table showing cell boundaries, 

4, Make up an outline of a frequency table you would use in presenting the 
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heights of all Princeton men of the Class of 1952, if the heights were available 
to the nearest quarter of an inch. 

5o Find the 10th, 40th, 60th and 80th percentiles graphically from Figure 2.4, 
and compare them with the 10th, 40th, 60th and 80th percentiles as determined 
graphically from Figure 2.2, Working from Figure 2.1, find the actual percen¬ 
tages of cases less than or equal to each percentile in the two cases. Also 
find the percentages of cases less than each percentile in the two cases. 


Instructions for problems 6 - 14 . In each of the problems Nos. 6 to 14 
the following operations are to be carried out: 

(a) Make frequency table, showing frequency and cumulative 
frequency distributions, relative frequency and rela¬ 
tive cumulative frequency distributions. 

(b) Construct a frequency histogram, 

(o) Construct a cumulative polygon and find the two quar- 
tiles, the median and the inter-quartile range from 
the graph. 

6, In the following table are given the scholastic aptitude scores of the 66 
departmental students of a certain department in the Class of 1938: 


345 

530 

395 

516 

563 

444 

505 

604 

402 

406 

472 

475 

691 

523 

624 

582 

523 

575 

461 

439 

490 

523 


556 

354 

479 

494 

629 

439 

490 

446 

730 

505 

611 

585 

468 

468 

574 

578 

420 

603 

596 

417 

585 

585 


593 

574 

417 

494 

486 

560 

604 

464 

515 

549 

523 

541 

545 

468 

505 

629 

527 

607 

384 

490 

431 

549 


7. The following table gives 125 observations of a spectral line (Birge Data), 
where only the last two digits of the reading are recorded. For example, the 
first reading is actually 65,177 mm, of which only the 77 is recorded. 
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77 

74 

73 

84 

77 

78 

85 

80 

81 

80 

75 

69 

72 

83 

79 

75 

80 

79" 

74 

78 

70 

74 

83 

72 

79 

73 

81 

87 

82 

79 

78 

79 

78 

74 

85 

83 

79 

83 

81 

84 

81 

88 

79 

80 

78 

77 

80 

85 

80 

78 

72 

75 

73 

85 

79 

78 

82 

80 

76 

76 

79 

75 

83 

81 

78 

82 

76 

78 

78 

79 

86 

79 

79 

84 

74 

76 

75 

77 

82 

77 

79 

77 

72 

77 

81 

83 

75 

82 

90 

77 

80 

78 

83 

81 

74 

79 

80 

79 

75 

84 

81 

74 

73 

74 

86 

77 

82 

75 

74 

75 

75 

74 

83 

76 

84 

72 

84 

73 

77 

77 

76 

75 

81 

79 

74 


See instructions preceding Problem 6. 

8, Roll 5 dice and record the total number of dots that appear. Repeat this 
50 times. See instructions preceding Problem 6, 

9, Shuffle a deck of ordinary playing cards, deal off two hands of 13 cards 
each (one is yours and the other belongs to your partner) and record the total 
number of honor cards in the two hands. Repeat this 50 times. See instructions 
preceding Problem 6* 

10. Take 25 pennies (or any other kind of coin), put them in an empty pocket, 
jingle them thoroughly, take thfem out, spread them on a table, count the number 
of heads and record the result. Repeat these operations 60 times. See instruc¬ 
tions preceding Problem 6. 

11. Take 50 thumbtacks, shake them up, throw them on a table, count the number 
of tacks that fall point up, and record the result. Repeat this 75 times. See 
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instructions preceding Problem 6, 

12 0 Pick up any book (containing no formulasO j open the book at random, count 
the number of e’s on each full line on the 2-page spread and record the number 
for each line. See instructions preceding Problem 6, 

13, Take any mathematical table such as a table of square roots, logarithms or 

trigonometric functions, in which the entries are blocked off in sets of five. 
Start with any block in the table you please, note the last digit in each of the 
five numbers in the block, and add these five digits together. Record the sum 
(which will be one of the 46 numbers 0, 1, 2, 3, 45), Repeat this for the 

next block, the next, etc,, until you have 60 blocks. See instructions preced¬ 
ing Problem 6, 

14, Get 50 metal-rimmed price tags about 1 inch in diameter at any stationery 
store, Mark 10 tags with the number 3 on each side, 10 tags with the number 4, 
10 with the number 5, 10 with the number 6, and 10 with the number 7. Stir up 
this population of fifty tags thoroughly in a bowl or an empty coat pocket. 

Draw out a sample of 5 tags, and find the mean of the numbers on the five tags. 
Put the 5 tags back in the population and draw another sample of 5. Repeat 
this 80 times. See instructions preceding Problem 6. 


2,3 0uiriula tive Polyp-;ons Graphed on Pro babilit y Paper, 

You have now plotted enough cumulative polygons to realize that they 
are usually steeper in the middle than at the ends. This is a very general 
characteristic of cumulative polygons. It is sometimes convenient to plot them 
on a special kind of graph paper called probability graph paper so that they 
become approximately straight lines. This is accomplished by stretching the 
percentage scale for low percentages and for high percentages. If we plot the 
c'omulative frequency polygon shown in Figure 2,4, we obtain the graph shown in 
Figure 2,5 which is much more nearly a straight line. Cumulative polygons 
plotted on probability graph paper will be discussed in greater detail in 
Section 8.3, 


Cumuiaflve 

Frequency 


Cumulative 

Percentage 


9S 



1.30 1.40 1.50 . 1.60, , 1.70 1.8( 

Weight of coating in oz. 

Cumulative Polygon o^f_Fi_£ure^ Plotted on Probability Graph Paper 

Fig;ure 2,5 
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2.4 Frequency Distributions — General . 

The general aspects of the procedures exemplified in Section 2,1 and 

2,2 should be noted. In general, we start off with a set of n measurements of 

some kind. We may call these measurements X , X , ,,,, X and in practice 

j. c o n 

they would be displayed in a table similar to Table 2olo In Table 2,1, for ex¬ 
ample, n = 75 and we could take X = 1.47, X = 1.62, ,.o, X = 1.47. 

In their ungrouped form, these measurements may be represented 
graphically by n dots in a dot frequency diagram exemplified in Figure 2,1, Here 
each measurement in the set of n measurements is represented by a dot placed 
above the value of that particular measurement wherever it occurs along the 
X-axis, The dot frequency diagram gives a convenient pictorial arrangement of 
the X’s ranged from least to greatest, ties being indicated by dots placed in 
vertical columns of 2 or more dots. 

These n measurements can also be graphically represented, in their un¬ 
grouped form, by a cumulative graph . In such a graph, the ordinate erected at 
any given abscissa simply represents the number of the measurements (i.e,, the 
number of X’s among the set X^, Xg, o,,, X^) which are less than or equal to that 
given abscissa. This graph can be constructed immediately from a dot frequency 
diagram, since the n X'’s are arranged in order from least to greatest in that 
graph. 

The handling of the n individual measurements in ungrouped form becomes 
too detailed and laborious for practical purposes if the tobal number of differ¬ 
ent numerical values actually taken on by the n measurements is very large (more 
than about 25 for practical purposes). This amounts to saying that working with 
ungrouped data involves too much detail if the dot frequency diagram has more 
than about 25 vertical columns of dots (each column containing one or more dots). 

Suppose the dot frequency diagram contains more than about 25 columns 
of dots (and this can be determined easily by looking at the n measurem.ents and 
seeing how many different numerical values are to be found among them), We then 
proceed to group the data to simplify matters. To do this, we look through the 
n measurements and find the least and greatest values. We take the range (the 
difference between the least and greatest values) and divide it into some 
number of equal intervals, making sure that the interval length actually cho¬ 
sen is a simple one in terms of the original units of measurement. This usual¬ 
ly means that it will not involve a lot of awkward decimals. When there are 
a very few measurements far out on one or both ends cf the distribution, it 



will often pay to use a smaller cell length and hence more cells. We then pro¬ 
ceed to cut the X-axis up into cells, each cell heing-an interval of the length 
chosen, and each cell having a simple and convenient midpoint. Once we have 
decided the cell length and the midpoints of the cells, we then proceed to arbi¬ 
trarily assign every measurement falling in a given cell a value equal, to the 
value of X at the midpoint of the cell. The cell boundaries for a given cell 
are placed one-half of the cell length on each side of the cell midpoint. 

We shall use the following notation: 


k is number of cells 
c is length of each cell 

^2' ^ midpoints of the first, second, 

k-th cell as counted from left to right. 


f f 

2 * 


f^ are the numbers or frequencies of the 


measurements X^, X^, X^ in the first, second, 

cells respectively. 




f F = f 
2* 3 1 


^3’“*' ^k ^ 


. 0 fj^ “ n, are the cumulative frequencies associated with 


the upper cell boundaries for the first, secondj 
cells respectively. 


, k-th 


X, + ^ C, 




first, second, k-th cells. In actual practice the boun¬ 

daries are given to one more decimal than the original 

measurements X , X , ,,,, X , and they end in 5, This makes 
-L c n 

it possible to use the left-hand boundary of a cell as the 
right-hand boundary of the preceding cell without ambiguity 
as to which cell a given measurement belongs. These symbols 
are represented in Figure 2,6. 

In this grouping operation, what we are really doing is this* every 
one of the f^ measurements in the sample of measurements X^, X^, X^ which 

falls in the cell jl c is arbitrarily given the value x^, every one of the 

1 

fg measurements which fall in x,, arbitrarily given the value Xg, and 

so on for all of the cells, 

These symbols can be arranged in the form of a general grouped fre- 
queno.y table as shown in Table 2,3. 
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We have deliberately put in the notation for cell No. i because we 
shall f'requently want to refer to a "typical" cell in the table arid we can do 
this by talking about the i-th cell, its midpoint etc. 

fnequency column (d) and cumulative frequency column (f) in 
Table 2.3 can be represented graphically as a frequency histogram and a cumula¬ 
tive polygon respectively as shown in Figure 2.6, The value of X corresponding 
to any given percent, say p, as determined by the cumulative frequency polygon 
is called the p-th percentile of the measurements. 

In particular, the 50th percentile is the median, the 25th percentile 
the lower quartile , the 75th percentile the upper, quartile and the difference 
between the upper and lower quart!les is the inter-quartile range . 

Exercise 2.4 

1, For each of the problems No, 6 to 14 of Exercise 2,2 which you did, plot the 
cumulative polygons on probability paper, 

2, Express F^^^ _ F^, F^ - F^, F^ - F^^^, in terms of f^, f^, ..., f^ , 

3, If the cell length is doubled, what, approximately, will happen to the 
entries in the frequency column of a frequency table? 

4, What is the largest possible change which can happen to a measurement when 
changed from its original value to a cell midpoint? Illustrate when c - ,05 and 
the measurements are made to tv-ro decimals, 

1 

5, Referring to Table 2,3, how many measurements lie between x. - ^ c and 

1 

X. + ~ c ? Express .your answer in terms of the capital F's. Also express it 
3 ^ 

in terms of the lower case f's. 


CHAPTER 5, SAMPLE iEAN AND STANDARD DEVIATION 


3• 1 Mean and Standard Deviation for the Case of ITnp^rouped Meas ur ements . 

In Sections‘2,1 and 2.2 we have seen how a given sample of statistical 
measurements in both the ungrouped and grouped forms can be condensed into tables 
and graphs, and how information pertaining to percentiles can be obtained from 
the graphs. For instance, the 50th percentile or median is the "middle” of the 
distribution of measurements in a certain well-defined sense. The inter-quartile 
range is an indication of the "scatter" or "spread" of the measurements in a 
well-defined sense. 

There are other important ways of describing the "middle" of the dis¬ 
tribution and the "spread" of the distribution. In this section we shall discuss 
the arithmetic mean or simply the mean of the distribution of sample measurements 
as another description of the "middle" of the distribution, and the standard 
deviation of the measurements as another description of the "spread" of the dis¬ 
tribution, 

3,11 Definition of the mean of a sample (ungrouped) . 

As a simple example, suppose the weights (in pounds) of five students 

are 141, 136, 157, 143 and 138. The mean of this sample of five weights is the 

sum of the weights divided by 5, i.e, 

141 + 136 + 157 + 143 + 138 715 ,,, 

mean = -p-- = = 143 lbs, 

b b 

In general, if X., X , X„, X is a sample of n measurements, the 

1 2 o n 

sample mean X of the X’s is defined by the following relation: 

(3.1) ... .1^, 

We can write the sample sum X^ + X^ + ,,, more compactly as X^ 

■ ^ 

where is the Greek letter capital sigma (chosen to corx'espond to the first 

n 

letter of the word "sum"), X. is to be read; "the sum of X sub j from j = 1 

J=1 ^ 

to j = n". Hence, (3,l) can be written more compactly as. 
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from which the formula for the mean X is ’^nritten explicitly as 

(3.3) X = -FX. . 

1=1 

n 

We shall be using the sample sum X so often that it will be convenient to simply 

call it S(X), read "sura of X" in which case we may write (3.2) more briefly as 
(3.2a) n X « S(X). 

and ( 3 . 3 ) more briefly as 

(3.3a) X = i S(X), 

n 

If we refer to our example of 5 weights we would have n = 5. X = 141 

1 * 

5 

Xg = 136, Xg = 157, X^ = 143, X^ “ 138, X. = 715 (or S(X) = 715) and applying 

1=1 ^ 

formula (3.2) to the 5 weights would give 5(x) = 715 or the mean is X = = 143 lbs. 

5 

The mean of the sample of 75 measurements in Table 2,1 is given by 

75(X) = 1.47 + 1,62 + ... + 1.47 
= 114.51 


X ? 1,527 ounces. 

In other words, applying the formula (3.2) to the measurements in Table 2.1, gives 
75(X) = 114.51, and applying (3.3) gives X = 1.527. 

Suppose we take the difference between each X and the mean X. We have 
X^ - X, Xg - X, ..., X^ - X. If we add these differences we get 

(X^ - X) + (Xg - X) + ... + (X^ - X) 

= (X + X + ... + X ) - nX = 0 
12 n 

because of (3.1), Hence by using summation notation we have 
(3.4) ZI (X. - X) = 0 , 


other words, the sum of the differences between each measurement in a sample and 
the mean of all measurements jn the sample is equal to zero . 

Returning to our example of 5 weights, we note that the differences be¬ 
tween the measurements and the mean are (l41 - 143), *(l36 - 143), (l57 - 143), 

(143 - 143) and (138 - 143) or - 2, - 7, + 14, 0,-5 respectively, and that the 
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sum of these differences is zero. 


3 0 Definition of the standard deviation of a sample (un^rouped ). 

Considering the example of the 5 weights again, suppose we square the 
difference between each measurement and the mean and add them. The standard 
deviation s of the 5 weights is given by the following relation: 

(5-l)s^ = (141-143)^ + (136-143)’ + (157-143)^ + (143-143)^ + (138-143)^ 
or 

rj 2 9 p p p 

4 s^ = 4 7^ + 14 + 0 + 5 = 274. 

From this we find 

s^ = 68.50 


or 


s = 8,27 . 

More generalx.y, if X^, oo,, X^ is a sample of n measurements, the 
standard deviation s,^ of the sample is defined by 


(3.5) 


(n-l)3j^ = (X^-X)^ + * ••• 


Using the summation notation this can be written more briefly as 

n 


(3.6) 



(n-l)A, = 23 (X -X)^ . 
1=1 ^ 

2 


The quantity s^, the s quare of the standard deviation s^,is called the) 

variance of the sample, Y'J'e shall not rewrite (3.5), (3,6) or any similar formula 

so as to give an explicit formula for the standard deviation s^. For we can 

perfectly well talk about the standard deviation s given by (3,S) or the vari- 
2 

ance s^ given by (3,6) •’without having to ivrite down two formulas. 

From the point of view of computation, a formula which is often .more 
convenient when a calculating machine is available can be found from (3,5). 

For, by squaring each term on the right-hand side of (3.5) we have 


(3.7) (n-1) 
or collecting 
(3.3) (n-1) 

But 


s| = 2.X^X ■<- X^) * (Xg- 2.XgX + X^ 

terms 

4 = (^1 + + ... + X^) - 2 X (X^ 

from formula (3.1) it is seen that 
n X = (X^ + X^ + ,.c + X 


) + 

+ X 


2 


(X^- 2X X + X^) 
n n 


+ X ) + n X^, 
n 




S 

IS* 

m 


m 


i 


i 
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which, when used in the right-hand side of (3.8) gives 

I (n-1) si = (X^ + + ... + X^) - 2 n +nX^, 

X 1 2 n 


Using summation notation 


(3,10) 


Cn-l) Sy = II 


which is the desired formula. In practice, it is convenient to delay the divis- 

- 2 

ion by n in calculating X and to calculate s^ from the formula 


(3.11) 


n-± j Sv = 


nr, . r n 

hi J a Lfc 0 J 


which may he written still more briefly as 
(3.11a) (n-1) s| = S(X^) - i [ S(X) 

It should be noticed that s and S(X) are two entirely different 
symbols and have entirely different meanings, s^ is the standard devia:y^n_^ 
the sample and S(X) is the sample sum, i.e., the sum of the measurements in 


the sample 


we find 


.n sxample, if (3.11a) is applied to the 75 Tseasurements of Table 2.1, 


74 4 = [ (1.47)^(1.62)2+ ___ ^ ^ ^ ^ . 1,47 

= (175.5849) - (114.51)^ = .7510 

s? = .01015 


You will have noticed that n-l appears in formula (3.6) for the variance 
where you might have expected n. One reason for this is that although there are 
n squares on the right-hand side of (3.6), the sum of these squares aotnmllx re- 

duces to n-1 squared quantities. To see this, consider the case of a sample of 

one measurement X^. Here X = and (X^ - X)^ = 0, so that formula (3.6) in this 
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case IS 

(1-1) 3^ = (X^ - X)^ = 0. 

Now let us look at the case of a sample of two measurements and X^, 

_ ^ 

Since X 1 2 ^ we have for the right-hand of (3.6) 


(X - X)^ + (X . x)"" = (X, - f * (X, - V^2 ) 

^ 1 ___ ^ __ 


X,*X„ ^2 


X,+X„ n2 


h - h 

yr 


Thus (3,6) reduces to 


(2 - 1 ) = 


h - ^2 Y 

yF / 


which has only one squared term on the right. 

In case of samples of three measurements X^, X^, Xg, the right-hand 
side of (3,6) can be written as 


(x^ - x)^ + (Xj - x)^ + (Xj - x)^ = / h ~ h 

\ yz 




/6 


& 


m 

I 

1 

S 

I 

m 


I 


I 

I 


In other words, the sum of three squares reduces to the sum of two squares. 

n 2 

It is generally true that ^ (X. - X) can be written as the sum 

of n-1 (and no fewer) squared differences among the sample measurements. For 

this reason we say that / (X. - X) has n-1 degrees of freedom and we use 

2 

(n - l) rather than n in formula (3,6) in defining s . 

jL 

In the preceding paragraphs we have been talking about sample means 
and standard deviations. We should remember that in most statistical problems we 
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work with samples, rather than populations from which these samples are supposed 
to have been dra'wn, because it is only rarely feasible or possible to obtain 
measurements on an entire population, and then Oxuly in the case of a finite pop- 
ulatioHo If we did have measurements for an entire finite population we could, 
of course, co.mpute bhe mean a of the population just as we have calculated the 
mean X of a sample. Similarly for the variance. 


Exercise 3,1 , 

1, Final inspection of nine aircraft before delivery revealed the following num¬ 
bers of missing rivets; 8, 16, 14, 19, 11, 15, 8, 11, 21, Find the mean, varianct 
and standard deviation of the number of missing rivets per plane, 

2, The first ten sentences in Somervell's abridgemeiit of Toynbee’s A Study of 
History have the following n’umbers of words; 55, 19, 11, 39, 9, 12, 15, 28, 46, 

24, Find the mean, variance and standard deviation of these sentence lengths, 

3, Five 2000-piece lots of a certain electrical device contained the following 
numbers of defective pieces; 4, 9, 3, 2, 1. Find the mean, variance and standard 
deviation of the number of defectives, 

4, If X. = 1, X„ = 6, X„ = 4, X. = 7, X^ •= 3, find the value of the following; 

1 2 0 4 0 


(a) X. 


(d) ^ (3X. V 2X 


3 3 


(■“) Z y' 


Pi J 


(s) Z V (X - 1) 


(«) Z (y - 2) 


(f) jz * y 

• -1 .1 tJ 


5, If X^, X^, 0 , 0 , X^ are any numbers and if C is any constant, show that 

n n 

CX . - G J” X. . 

u ^ u ^ 

(Check this for the example = 1, X^* ^ 2, .,,, X^ = n, n = 8, C = 5,) 

6, If X . X , ,,,, X and Y . , .o., Y are any two sets of numbera and A. and 

I.- ■■ 2 n 1' 2' n 


■e any two corn 


its, show that 
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y (ilX, + BY.) « X. + bTy. . 

0-1 J J 0-1 J 0-1 ' 

(Check this for the example; A = -6, B = 9, n = 4, X = 2, X, = 4, X = 5, 

.Leo 


.Y^-2, Yg 


■1 Y == 0 Y 


3.) 


7. Suppose X , X^, ..o, X are u measurements which have mean 10 and standard 

-L ^ n 

deviation 3o If a new measurement Y is obtained from each X measurement by the 

equation Y - 4X + 2, what are the mean and variance of Y., Y , Y ? 

A. c n 

8. If the mean of measurements X.,X-, .... X has value A and the standard de- 

12 n 

viation has value B, and if Y == aX + b, what are the mean and standard deviation 
of Y^, Y ^3 O..J Y^j expressed in terms of A, B, a and b ? 


3 • 2 Remarks on th e Interpretati on of the I7ea :.' and Sta.^idard Dev~ at i on of a Sample . 

It was found that the mean of the sample of 75 measurements in Table 2,1 
is 1,527, Turning to Figure 2,1, it will be seen that the mean (indicated by the 
arrow) is near the "middle" of the distribution of dots. Actually the mean is at 
the center of gravity of the distribution. By this we mean that if the dots in 
Figure 2,1 were alx of equal weight and could be imagined as neatly arranged piles 
of blocks setting on a thin board (the X-axis) then this arrangement would be just 
balanced by holding a knife-edge under the board at the mean lo527 ounces. The 
mean has another property: If 'we take each measurement minus the mean, we get 75 
"discrepancies" or differences; some of these are positive and some are negative, 
but , as we have seen from expression (3,4), the algebraic sum of all of the dif¬ 
ferences is equal to zero . 

We have seen that the mean of a set of measurements gives us some in¬ 
formation about where the "middle" or "center of gravity" of the set of measure¬ 
ments falls, but it gives no information about the "scatter" (or "amount of 
concentration") of the measurements, For example, the 5 measurements 14, 24,5, 

25, 25,5 and 33 have the same mean as the 5 measurements 24, 24.5, 25, 25,5 and 
^26, but the two sets of measurem.ents have widely different amounts of "scatter". 
One simple indication of the "scatter" of a set of measurements is the range. 
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the largest value minus the smallest. In the two sets of measurements 
mentioned, the ranges are 22 and 2 respectively. If we always worked with fairly 
small samples of the same size n (as is the case in industrial quality control , 
where n = 4 and n = 5 are widely used sample sizes) then we would find the range 
very convenient. It is difficult, however, to compare a range for one sample 
size with that for a different sample size. For this and other reasons, the 
range, in spite of its simplicity, convenience and importance, is used only in 
rather restricted situations. It is very widely used in the field of industrial 
quality control. 

The inter-quartile range defined in Section 2.1 is only useful when 
the samples are large enough to establish the quantiles fairly well. For n less 
than about 25 the quantiles are of doubtful value. 

We clearly need a measure of scatter which can be used in samples of 
any size and in some sense makes use of all the measurements in the sample. 

There are several measures of scatter that can be used for this purpose, and the 
most common of these is the standard deviation . For normal (or Gaussian) dis¬ 
tributions, to be described roughly below and in greater detail in Chapter 8, 
the standard deviation is the "natural” measure of scatter. 

Many saiaples of measurements yield cumulative polygons on probability 
paper which are nearly straight lines . This means that their frequency histo¬ 
grams are fairly symmetrical and bell-shaped, and that: 

(a) About 95;^ of the measurements fall vfithin a 
distance of 2*8,^ (two standard deviations) of X, 

(b) About 68^ of the measurements fall within a 
distance of s ^ ( one standard deviation ) of X, 

('^) About 50^ of the measurements fall within a 

distance of -rS-r (0,6745s^ to be more precise) of X. 

i—p ' 

For example, in the case of the zinc coating measurements of Table 2,1, 
(which did give a fairly straight cumulative polygon as you rem.ember from 
Figure 2.5) we found X = 1,527 and s^ .101. Within 2(.10l) of 1.527 (i.e., 
between 1.325 and 1,729) there are 71 out of 75 measurements or 94.7)^ instead 
of 95^. Within ,101 of 1,527 (i.e,, between 1.426 and 1,628) there are 52 out 
of 75 measurements, or 69.3)^ instead of 68^. Within |K.101) of 1.527 (i.e., 
between 1.460 and 1,594) there are 35 out of 75 measurements, or 4.6^7% instead 
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of 50?^, In all cases the agreement between the "actual" percentages and the 
"theoretical" percentages is good. 

Those distributions of indefinitely large populations whose cumulative 
polygons are straight lines on probability paper are called normal or Gaussian 
distributions, and are of great importance in statistics. Since large samples 
of many kinds of actual measurements give nearly straight cumulative polygons 
on probability paper, the theory of normal distributions has important practical 
consequences. In particular, we shall often want to select a normal distribu¬ 
tion which "fits" our sample of measurements. Fitting such a distribution depends 
on knowledge of the mean (which tells where to center the distribution) and the 
standard deviation (which tells how widely to spread the distribution). We shall 
show how to fit a normal distribution in Chapter 8. 

3 .3 The Mean and Standard Deviation for the Case of Grouped Data . 

Tne deterrination of the mean and standard deviation of a sample of data 
by the formulas of Section 3.1 are likely to be unnecessarily laborious for a 
large sample. Large samples of observations are usually treated as grouped data . 
When computing by hand a sample of more than about 50 measurements (or when 
using a computing machine a sample of more than about 100 measurements) should 
almost surely be treated as grouped data. 

3.31 An.example . 

To see how we should proceed in calculating the mean and standard de¬ 
viation of a grouped distribution, let us return to the data in Table 2,2 as an 
example. The only quantities that will be needed from Table 2.2 in computing 
the mean and standard deviation are the cell midpoints and the frequencies which 
are rewritten as columns (a) and (b) of Table 3.1. 

Reraembei' that in grouping the data of Table 2,1 and arranging it in 
Table 2.2 we are arbitrarily assigning the one measurement falling in the cell 
1.30 ^ .025 the value 1.30, assigning the 5 measurements falling in the cell 
1.35 ,025 the value 1.35, and so on. Thus, when the data are grouped, we consider 

that we have the following measurements? one measurement with the value 1.50, 6 
with the value 1.35, 6 with the value 1.40, and so on. The mean X of the 75 
measurements is obtained by applying formula (3,2), We have 

75 X = (1.30) + (1.35 + 1,35 1,35 + 1,35 1.35) 




i|f ItiiilElfgiiliSiilliiiiSilHISMr IB IBiililH 3 liUtlf fflllilliiiSt •illWf!: 
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+ (1„40 + 1,40 + 1.40 + 1.40 + 1.40 + 1.40) 

+ ... + (1.75 + 1.75 + 1.75) 

75 X = l(l.30) + 5(1,35) + 6(1.40) + ... + 3(1.75) = 114.55 . 


Hence, the raean X is 


X = = 1.527, 


Notice that the quantity on the right-hand side of the exprenssion for,75 X Is 
the sum of the entries in column (c) in Table 3,1. 

TABLE 3.1 


Table Shoiwing Calculations for Obtaining: Mean and Standard Deviation 


of Sample from Grouped Data 


(a) (b) (c) (d) 

Cell 

Midpoint Frequency 

X, f. f.x. f.x^ 

1 1 11 11 


1.30 

1 

1.30 

1.6900 

1.35 

5 

6.75 

9.1125 

1.40 

6 

8.40 

11.7600 

1.45 

13 

18.85 

27.3325 

1,50 

8 

12.00 

18.0000 

1,55 

17 

26.35 

40.8425 

1,60 

14 

22,40 

35.8400 

1,65 

7 

11.55 

19.0575 

1.70 

1 

1.70 

2.8900 

1.75 

3 

5.25 

9.1875 

Total i 

n = 75 

S(X) =114.55 1 

S(X^) =175.7125 


We find the standard deviation of the grouped measurements by applying 
formula (3.11) (or the briefer formula (3.11a)) to the grou.ped measurements: 

■ 74 = (1.30)^ [ (1,35)^ + (1.35)^ + (l.35)^ + (1.35)^ + (1.55)^ ] 

A. 

+ ... + [ (lo75)^ + (1.75)^ + (1.75)^ ] -A (114.55)^ 

= 175,7125 -A. (114.55)^, 










44 


3 c SAMPLE AKD STANDARD DEVIATION 


SeCo 3»32 


Hence 

= .01022 

Ji. 

and 

2 

Note that the sum of the squared terms in the expression for 74 s is 

the total of the entries in column (d) of Table 3,1, 

In general, there is a slight difference between X as calculated from 

the ungrouped measurements, and X as calculated from the grouped measurements. 

In the example of the zinc coatings, the values of X for the grouped and ungrouped 
114 55 114 51 

cases are —— and ——, respectively. These two quotients have the value 

1,527 to three decimal places. Similarly, there are, in general, differences 

between the values of s as calculated from the grouped and ungrouped measurements, 

^ 2 

In the example of the zinc coatings, the values of s are .01022 and ,01015 respect¬ 
ively, for the grouped and ungrouped oases. These discrepancies are due to the 
grouping operation. In the present example, grouping the data into 10 cells of 

length ,05 ounces does not change the value of X or s to any practical extent. 

Jv 

In any given problem, it is evident that decreasing the size of the cells tends 
io decrease the effect of grouping. 


3,32 The general case . 

To find expressions for the mean X and standard deviation s in the 
case of a general grouped frequency distribution, we consider the cell midpoints 
and frequencies in a general grouped frequency table as given in columns (a) and 
(b) of Table 3,2, We construct column (c) of values of i.e., products of 

cell midpoints and frequencies. Similarly, we construct a column of values of 


f^x^, i.e,, products of squared c.ell midpoints and frequencies, 

Tlie mean X of the grouped .measurements is obtained by applying formula 
(3.2a) to the n individual grouped measurements. But when (3,2a) is applied we get 


f^ terms 


-A. 


fg terms 


f terms 

k ^ 


n X 








••V’ 


n X = f^ x^ 




k k 


which may be written more compactly as 
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n X = E fi - 

i = l 


from which X may be more explicitly written as 

(3.13) X = -EVi* 

1=1 


TABLE 5,2 

General Table for Finding Mean and Standard Deviation 
of Sample from Grouped Data 


(a) 

Cell 

Midpoint 

X. 

1 

(b) 

Frequency 

f. 

1 

h 

h 

^2 

^2 


e 


0 


<• 

0 

« 

X. 

f. 

1 

1 

0 

0 


• 

• 

• 



Total 

n 



S(X-^) = f.: 

i=l 







S, SAMPLE »N AND STANDARD DEVIATION 


SeCo 3oS2 


(3.14) 


S(X) = £ f X , 


which is simply the sum of the entries in column (c) of Table 3e2, 

The standard deviation is given by applying the formula (3,11a) 

to the n individual grouped measurements. We have already seen that the value 

of S(X) for grouped measurements is given by fonnula (3,14), In the case of 
2 

S(X ), we similarly have 

(3.15) S(X^) = . 

i=l ^ ^ 

which is given by the sura of the entries in column (d) of Table 3,2, Hence, 
the formula for the standard deviation for grouped data is given by 

2 / i 2 \ 1 1 2 

(3.16) (n-1) = (T~ f.xf - • T“ f.x. 


Exercise 3,3 , 

1, "On" temperatures at which a certain thermostatic switch operated in 25 trials 
were as follows (Grant data); 


55 

54 

55 

51 

53 

55 

55 

54 

51 

56 

55 

55 

54 

53 

55 

54 

53 

50 

52 

56 

55 

55 

50 

56 

55 


Find the mean and variance of the "on" temperatures. 

2, Suppose 1000 pieces of enameled ware are inspected and the number of surface 
defects on each piece is recorded. If the distribution of number of defects is 

No, of defects Frequency 
(cell midpoint) 



0 

1 

2 

3 

A 

600 

310 

75 

13 


Cj 



BiffltJIliiMHiii! iItfiHiiliniililliill ili!a!l«:■IH IItlt liBiaJaS' »nlijMKiiiiiifiM 
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Find the mean and variance of the numbers of defects. 

3. In a germination experiment, 80 rows of cabbage seed with 10 seeds per row 
were incubated,, The distribution of number of cabbage seed which germinated 
per row was as follows (Tippett data): 


No, seeds germinated 

per row 

(cell midpoint) 

X. 

1 

Frequency 

of rows 

f. 

1 

0 

6 

1 

20 

2 

28 

3 

12 

4 

8 

5 

6 


Find the mean and variance of the numbers of seeds germinating per row, 

4. A pair of cheap plastic dice were thrown 100 times and the distribution of 
the total number of dots obtained was as follows j 


No. of dots per throw 
(cell midpoint) 

X. 

1 

Frequency 

f. 

1 

2 

0 

3 

7 

4 

9 

5 

19 

6 

16 

7 

13 

8 

11 

9 

4 

10 

6 

11 

11 

12 

4 


Find the mean and variance of the numbers of dots obtained. 
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Simplified Computation of Mean and Standard Deviation , 

For co.:;putational purposes it often saves a great deal of labor to 
calculate the mean and standard deviation by changing the measurements to a new 
scale with a new origin or to a new scale with a new unit and a new origino 
First, we shall consider the simplest case; the use of a working origin , 

3,41 Effect of adding a constant . 

Suppose we convert the X measurements X^, X^, X^ to new measure¬ 
ments Yg, Y^ by adding any constant a, i.e,, by using the following 

relation between any X value and its corresponding Y value; 

Y. = X. a. 

The constant a may be either positive or negative. Let us consider the relation 
between the mean Y of the Y measurements and the mean X of the X measurements. 

We have 

u n 

S(Y) = T‘(X. +a)=7“x. + na = S(X) + n a. 
or 

S(Y) = S(X) + n a. 

Dividing by n and using (3.3) we find 
(3.17) Y = X + a. 

Hence, the effect of adding a constant a to each of a set of X measurements is 
to add a to the mean of the measurements . This statement also holds for grouped 
measurements, 

Next we see that for any measurement 

- Y * (X^. + a) - (X + a) 



which means that the deviations from the sample mean are unchanged by adding a 
constant a to each measurement. Hence, the squares of these deviations remain 
unchanged, and therefore, we have 

(S.18) E = E (^1 - , 





ms 


m 


or written another way 
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(3.19) S(x2) . i [S(X)]^ = S(Y^) - i [3(1)]^ . 

Expressed still another -way, if is the Tarianoe of the X meas\jreraents 
and By is the variance of the Y measurements, we have 

4 4 "x ’ "y • 

•'- 

Therefor e a adding a constant a to each X measurement do es not change . 
the variance or standard deviation of the measurements^ . 

3,42 Examples of- using a working;; origin . 

As a simple example of the use of a working origin in finding the mean 
and standard deviation of a set of measurements, let us return to the example of 
the 5 weights mentioned early in Section 3.1, The five X measurements are 141, 
136, 157, 143 and 138. Let us take the constant a to be - 140. Using the rela¬ 
tion Y = X - 140, we find the five Y measurements to be + 1, - 4, + 17, + 3, - 2, 
respectively. 

First, we consider the mean. We have , 


and therefore from (3,17) 


S(Y) = + 15 


= X + ( - 140) , 


X = 143, 

as found in Section 3.1, 

Now consider the variance. To calculate the variance of the Y measure¬ 
ments we need S(Y) and S(Y^). Their values are S(y) = 16, S(Y^) = 319. Hence 

(4) s| = 319 - I (13)^ ’ 


68.5 , 


Sy = 8.27 = Sy , 


as found in Section 3.1. 
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A v;orking origin may be used in a similar way for grouped measurements. 
Let us return to our example of zinc coatings, and use 1,50 ounces as our working 
origin. Table 3,3 shows the computation, (ignore columns (f), (g) and (h) for 
the moment.) 

To determine the value of X, we note that 


Y = MU , 2^ = 0273 

IX 75 ’ 


and since 


we get 


Y = X - 1.50 , 

X = 1.50 + .0273 
= 1.527 , 


as found in Section 3,1 directly from the X measurements. 

2 

To determine the value of we have from (3,19) 

(n-l) = (n-1) si = S(Yt - “ [S(Y)]^ . 

A i n 


Substituting values , 


Hence , 


and 


74 = ,8125 - M [2.05]^ 

1 fo 

= .7565. 


s^ = .01022 


Sy “ .101 = S^ , 


as found in Section 3,3, 

Columns (f), (g), (h) in Table 3.3 enable us to check our computations, 
for we have 


Vi" 

i«l 1=1 1=1 

which may be written more briefly as 

(3.20) S[(Y + 1)®] = S(Y^) + 2 • S(Y) +. n. 

Applying this to Table 3,3, we have 


n , 






Table Showirifg; Computations Required in Coir.putin!?- I.leeai and Variance 
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79.9125 = .8125 + 2(2,05) + 75 
= 79.9125. 


If (3,20) holds for any given table, we can be practically certain (although not 
absolutely certain) that our computations are correct. If it fails to hold, we 
can be sure that we have made an error. 


3,43 Fully coded calculation of means, variances and standard dbviations . 

While we are at this job of trying to simplify the computation of the 
mean and standard deviation, there is one more step we may as well take — at 
the expense of over-emphasizing the problem of calculating means and standard 
deviations. This step simplifies such calculation as much as possible. 

To do this, we make a change in scale in addition to making a change 
in origin (as discussed in Sections 3.3 and 3.4), so that our deviations from 
the working origin are simple to square and multiply. 

Suppose Z measurements are defined in terms of X measurements by the 

relation 

X. = a + b Z. , 

J 3 

where a and b are any constants. We may refer to the Z measurements as coded 
values of the X measurements. 

Summing, we have 

S(X) = n a b S(Z) , 

and dividing by n, we have 

(3.21) X = a + b Z . 

Now consider the deviations between Z. and its mean Z. We have 

8 

X. - I = (a * b 2 ) . (a + b Z) 

D 3 

or 

X. - X = b(Z. - "Z), 

8 8 

Squaring and summing 

(3.22) ^ (X - X)'"' = b^ .f; (Z - Z)^ 

J=1 ^ 

But 

E (z. - zf‘ (n- 1 ) si 

. J 4 
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and 


Z (X. - X)' 


(n-1) 


’X • 


Therefore, from (3,22) we find 


(3.23) 
or 

(3.24) 


2 ^2 2 


=x = 


Hence we have the following rule expressed hy (3.21) and (3.24); 

If X and Z are measurements satisfying the relation X. = a + b Z., 

1 J 

then the mean of the X’ s a plus b times the mean of the Z ’ s, and the stan¬ 
dard deviation of the X’ s b times the standard deviation of the Z' s. 

This rule holds no matter whether we are talking about grouped or 
ungrouped measurements. However, it does not pay to code measurements unless 
/yd:here are enough of them to justify calculation as grouped data. When we have 
\ grouped data and wish to use coded values, it is natural to choose the value of 
\ the constant b equal to the cell-length, and to choose the value of a to be a 
\ce^ midpoint near the middle of the grouped frequency distribution. 

Example . 

Returning to the example of the zinc coating measurements, the essen¬ 
tial constituents for coded computation are shown, in Table 3,4. 

Note that we do not need a column for (z. + 1), since the values of 

2 ^ 

(z^ Ij can be virritten down easily from sight. 


In this example, 


a 1.50, b = ,05. 


We have S(Z) = 41, and S(Z ) = 325, 
Therefore . 


and 


Z = f = .65 


74 = 325 - i (41)^ 


= 303.5867 


or 
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- 4.0478 
s = 2.0119 , 

Li 

Therefore, from (3.21) we find 

X = 1.50 + (.05} (.55} , 

= 1.528 

and from (3*. 24}, 

= (.05} (2.0119) 

= clOl . 

These values of X and are very close to those found by the longer methods of 
Section 3.1 and 3.3. 

TABLE 3.4 


T ah 1 e S i ^ owi ng G on-! put at i on . s Ivoedod for Fully Coded Computation 
of Kean and Vari ance of a F reqL-;ency Di'~ trihutior) 


(a} 

(b) 

(c} 

(d) 

(e) 

(c) 

(g) 

Cell 

Midpoint 

X. 

1 

Frequency 

f. 

1 

z. 

1 


f.z2 

1 1 

J. 


1.30 

1 

-4 

-4 

16 

9 

9 

1.35 

5 

-3 

-15 

45 

4 

20 

1.40 

6 

-2 

-12 

24 

1 

6 

1.45 

13 

-1 

-13 

13 

0 

0 

1.50 

8 

0 

0 

0 

1 

8 

1. "5 

17 

1 

17 

17 

4 

68 

i.eo 

14 

2 

28 

56 

9 

126 

1.6e 

7 

3 

21 

63 

16 

112 

1,70 

1 

4 

4 

16 

25 

25 

1.75 

3 



75 

36 

108 j 

Total 

n 75 

■ 


S(2^) = 325 


S[(Z+l)^]=482 J 


Check: S[(Z+1}^] = S(Z^) + 2 S(Z) + n 
482 = 325 + 2(41} + 75 


Exercise 3.4 . 

(Every student is expected to do problem No, 9.) 

1, Making use of a working origin, find the mean, variance and standard devi¬ 


ation of the following sample of measurements of copper content (in percent) in 
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10 bronze castings: 85.54, 85,54, 85.72, 85.48, ^5.54, 85.72, 86^12, 85.47, 

84.98, 85.12. 

2. Suppose the mean and variance of a distribution of lengths expressed in inches 
are 28,3 and 16,0, 7?}iat would the mean, variance and standard deviation of the 
distribution be if the lengths were expressed in feet? 

3. The burning lives (in hours to the nearest 10 hours) for 10 incandescent light 
bulbs of a certain type were found to be as follows: 850, 900, 1370, 1080, 1060, 
860, 1060, 1040, 1090, 1930, Find the mean, variance and standard deviation of 
this sample, making use of a working origin. 

4. The Mathematical Aptitude Scores of the 24 Princeton Chemistry Department 
Seniors of the Class of 1938 were: 550, 569, 440,' 608, 814, 595, 577, 595, 730, 
698, 518, 705, 692, 563, 531, 582, 479, 505, 711, 539, 614, 614, 524, 653. Find 
the mean, variance and standard deviation of these scores, 

5. A pair of dice can fall in 36 different "ways". One of these "ways" will 
yield a total of 2 dots, two will yield 3 dots, and so on. The frequencies of 
"ways" which will yield various total numbers of dots is as follows: 


Total No. 

Dots X. 

1 

Frequency 

f. 

1 

2 

1 

3 

2 

4 

3 

5 

4 

6 

5 

7 

6 

8 

5 

9 

4 

10 

3 

11 

2 

12 

1 


Find the mean, variance and standard deviation of total number of dots, (Use a 
working origin.) 

6. Thirty two dice were thrown 100 times. The distributions of the total number 
of dots per throw was as follows: 
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Total nOo of dots 
(cell midpoints) 

X . 

1 

Frequency 

f. 

1 

90 

3 

95 

3 

100 

4 

105 

15 ' 

no 

17 

115 

15 

120 

16 

125 

12 

130 

8 

135 

3 

140 

3 

145 

0 

150 

1 


Find the mean, Yariance and standard deviation of the total number of dots per 
throw (by using either a working origin or full coding). 

7. The following distribution of carbon content (percent) was obtained in 178 
determinations on a certain mixed powder (Davies data): 


Percent carbon 
(cell midpoint) 

x:. 

1 

Frequency 

f 

i 

4.145 

l' 

4.245 

2 

4.345 

7 

4.445 

20 

4.545 

24 

4.645 

31 

4.745 i 

38 

lO 

CO 

24 

4.915 

21 

5.045 1 

7 

5.145 

3 


Find the mean, variance and standard deviation of the carbon content (using 
working origin or full coding). 


8o The grouped frequency distribution of thickness measurements (in inches) 
determined from 50 places on a coil of sheet metal were found to be as follows 
(Wes tinan data) ; 
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Cell 

Midpoint 

h 

Frequency 

fi 

,147 

1 

,143 

1 

,149 

4 

,150 

6 

.151 

10 

.152 

6 

.153 

7 

,154 

6 

,155 

6 

,156 

3 


Find the mean, variance and standard deviation of this distribution (by using 
either a working origin or full coding), 

9, Find the mean, variance and standard deviation of the grouped frequency 
distribution obtained in whichever one of the following problems in Exercise 2.2 
you have done: No. 8, 9, 10, 11, 12, 13, 14, Use either a working origin or 
full coding. (Keep a record of your work.) 

10. Suppose U and are the mean and variance of a sample of m U measurements 
and 7 and s| are the mean and variance of a sample of n 7 measurements. Suppose 
all of these m + n measurements are put in one sample. 

(aj Show that the mean of this sample is 

m U + n 7 
- . 0 
m n 


(b) Show that the variance of this sample is 




CHAPTER 4. ELEMENTAJIY PROBABILITY 


4ol Preliminary Discussion and Definitions . 

In the preceding chapters we have dealt with the problem of describing 
a given sample of quantitative measurements. We have shown how to describe such 
a sample graphically (by using a dot frequency diagram, a cumulative graph, a 
frequency histogram or a frequency polygon) and how to describe a sample numer¬ 
ically (by graphical calculation of the median, quartiles and other percentiles 
and by arithmetical calculation of the mean and standard deviation). But cs 
pointed out in the introduction, we are usually interested in more than a mere 
description of a sample. We are interested in making inferences about the pop¬ 
ulations from which the samples come, i,e,, we are interested in making statements 
about intervals within which population means, standard deviations and other 
population parameters are likely to lie. In order to be able to make such 
statements, we must know something about how much sample means, standard devia¬ 
tions and other sample statistios vary from sample to sample, when repeated 
samples of a stated size are drawn from the same population. 

There are two approaches in the study of sample-to-sample fluctuations ,, 
of sample statistics: experimental and mathematical . 

In the experimental approach, we determine by repeated experiments 
(i,e,, repeated drawing of samples) how a given sample statistic will be distri¬ 
buted, For example, if we want to know something about the distribution of means 
in samples of 30 measurements out of a given indefinitely large population, we 
draw a large number (say lOO) of samples of 30 out o^ the -population and calcu¬ 
late the mean of each sample and make a frequency distribution of these means 
and find its mean and standard deviation. In the case of sampling from a finite 
population, one would have to return the sample of objects to the population 
after each drawing. The main difficulty with the experimental method of deter¬ 
mining sampling fluctuations is that it is very time-consuming and often costly. 
Even in such simple experiments as rolling dice, it takes a great deal of time 
to accumulate enough data to see how sampling laws work. 

By taking the mathematical approach we can determine theoretical samp¬ 
ling laws of some of the simpler sampling statistics, particularly sample sums 
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and sample means^ which apply pretty well to practical situations. It should be 
stated, however, that there are many sample statistics which have very complicated 
theoretical sampling laws, even for the simplest kinds of populations. In such 
cases one often has to resort to experimental sampling, using random numbers, to 
find out something about the sampling fluctuations. The mathematical approach to 
the study of sampling laws is based on the theory of probabili-^ . We shall now 
spend some time on the subject of probability. 

The word probability or chance is used loosely in our everyday conversa¬ 
tion and we know vaguely what it means. To give a few examples, we talk of the 
chance of winning a game of cards or dice or a football game; the probability of 
its raining toraorrowj the chance of a person living to be so many years old. In 
all these oases we are interested in a future event, of which the outcome is un¬ 
certain, and about which we want to make a kind of prediction. Sometimes we are 
content with a rough qualitative statement likei "It is very probable that it 
will rain tomorrow’’,* or "k man has little chance of IJ^ving a hundred years”. 
Sometimes we go further and tend to be numerical, as when we say ’’There is a 
fifty-fifty chance that we will win the game”, or ”I’ll bet you two to one that 
it will rain today”. In a mathematical discussion of probability we try tc present 
conditions under which we can make sensible numerical statements about uncertain¬ 
ties, and to present methods of calculating numerical values of probabilities and 
expectations, 

It must be agreed that when the term probability can be applied so 
loosely to so many diverse and complex phenomena as games, the weather, the span 
of human life, etc., it is hardly conceivable that one can give it a definite 
and precise meaning without some simplification. Familiar examples of application 
do not always lend themselves to simple analysis. For instance, it is well- 
known that the science of weather forecasting still is far from perfect; and for 
a layman to assign a probability to the weather of tomorrow in any doubtful case is 
hardly more than guesswork. JLgain, at a Prinoeton-Yale match, it will not be 
surprising to find Princeton and Yale men forming quite different estimates of 
the outcome of the game, even apart from sentimental reasons. In such a case, 
it may well be questioned if there is an unequivocal way of assigning a probability. 

These and many other considerations one may think of should convince 
us that in order to be on solid ground we must confine ourselves at first to the 
simplest phenomena of chance, (But if the study of statistics is to be useful 
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in scienoe, engineering, or even everyday life, we will have to study more and 
more complex oases and eventually stand on less solid ground.) An example of 
the simplest kind of chance phenomena is the toss of a coin. Here there are two 
possible outcomes: head or tail . The probability in question is whether the 
event "head" or the event "tail" will occur. Now although two men may well dis¬ 
agree on the weather of tomorrow or the result of a game, they would readily 
agree to call the chances even that one would get a head or a tail in tossing 
an ordinary coin. Indeed, agreement on this point is so universal that we say 
"It’s a toss-up" when we want to say that "Its probability is l/2". 

Another simple case is the roll of a die. An "ideal" die is a perfect 

cube with six faces having 1 to 6 dots. If we roll it once there are six 
possible outcomes; the face turning uppermost may be any of the six faces and 
accordingly the number of dots we obtain may range from 1 to 6, Here again we 
would agree to assign equal probabilities to the six faces (of an "ideal" die) 

so that each gets its equal share (l/6th) of the total probability. 

The case of a pack of playing cards is similar, though a little more 
complicated. There are 52 cards in a pack and if we pick a card "at random" 
from a well-shuffled pack, the chance of getting any particular card (named in 
advance) is 1/62, The chance of getting a spade is 1/4, that of getting a 
spade or a heart is l/2. The chance of getting a face card is 12/62, 

It is easy to see how these probabilities are obtained. In the case 
of the toss of a coin there are two possible cases, head and tail; and if the 
coin is "ideal", i.e,, uniformly and symmetrically made, and the tossing is 
"fairly done", there is nothing in favor of the turning up of the one or the 
other. Or we may put it this way: any argument which may be advanced for one 
side of the coin applies equally well to the other side, so much so that in the 
end we have no reason to expect the one rather than the other. We then consider 
that the two possible alternatives head and tail are equally likely and the 
probability of each is one (case) in two (equally likely cases), i.e,, 1/2. 
Similarly, the case of rolling a die admits six possible cases, i.e., the faces 
with 1, 2, 3, 4, 5, 6 dots. If the die is "well made" (in particular not 
loaded) and the rolling is "arbitrary", the six cases are considered equally 
likely. Thus each face, being one (case) in six (equally likely cases) gets 
the probability 1/6, The case of playing cards is entirely similar. 

We are thus led to the following definition of probability in the 
simplest situations: 
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If a-n event E can happen in m cases out of a total of 
n possible oases which are all considered by mutual agreement to be equally 
likely, then the probability of the event B is defined to be m/n, or more brief~ 
ly Pr(E) « m/n. 

Thus in the evaluation of the probability of an event we need to know 
two numbers* the number of possible oases and the number of favorable oases, 
i.e., those in which the event will have occurred. The ratio of the smaller 
number to the larger is the desired probability. Of course, it must be agreed 
in each instance that the possible cases are all considered equally likely. 

This can often be reached by plain common sense as in simple cases like those 
mentioned above. 

This definition, as far as it goes, is perfectly clear-out and agrees 
with our intuitive notion of chance. But, as the reader will easily see, it is 
difficult or impossible to apply as soon as we leave the field of coins, dice, 
cards and other simple games of chance. To return to one of the examples we 
had, how does it apply to the probability of rain? lhat are the possible oases? 
We might naively think that there are the two contingencies* "rain" and "no 
rain". But at any given locality it will not usually be agreed that they are 
equally likely. In fact, it is exactly the relative likelihood of these two 
cases that we are seeking — we must beware of a vicious circle, (It should be 
noticed that in simple cases of coins, dice, cards, etc,, this vicious circle 
is avoided by reasoning based on plain common sense,) In general, our first 
definition of probability cannot be applied whenever it is impossible to make 
a simple enumeration of oases which can be considered equally likely. 

In order to assign probabilities to more complex phenomena we must 
appeal to a principle of agreement based on observational evidence, namely. 


Definit^n^^^^^ I£ (a) whenever a series of many trials is made, the 
ratio of the niimber of times event E occurred to the total number of trials is 
nearly p, and if (b) the ratio is usually nearer to p when longer series of 
trials are made, then we agree in advance to define the probability of E as p, 
or more briefly Pr(E) = p. 

For example, the probability of getting a head in a toss of an "ideal" 
coin is agreed to be 1/2 (under Definition I). In a large number of trials the 
percentage of heads is found experimentally to be nearly 50 percent, that is, 
heads come up in about half of the trials. Thus, in 1000 tosses we would get 


4 - 
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^about" 500 heads, in 1200 rolls of a die we would get ”about" 200 sixes, etc. 
Interpreted in statistical language, we can think of 1000 tosses of a coin as 
a sample of size 1000 from the indefinitely large population of tosses which 
could be made with that coin. Thus, if we should consider a number of samples 
of 1000 tosses, we might get some such sequence of heads as 488, 518, 608, 474, 
497, etc. These numbers are all "nearly" equal to 60 percent of the total 
number of trials, that is, the, relative frequencies of number of heads are dis¬ 
tributed near ,6 in a frequency distribution, with a certain standard deviation. 
If smaller sample sizes had been used, the relative frequencies would have been 
distributed around ,6 with a larger scatter, i.e,, with a larger standard devia¬ 
tion, If still larger samples, say of 10,COO were used, the standard deviation 
of the observed relative frequencies would have been less. Actually, there is 
^ theoretical distribution of relative frequencies of heads in samples of n 
tosses, which will be discussed in Chapter 6 on the Binomial Distribution, The 
standard deviation of this theoretical distribution will be found to vary in¬ 
versely as the square root of n. In actual experiments with coins, one will 
find that the standard deviations of distributions of relative frequencies in 
samples of different sizes behave in fairly close agreement with this inverse 
square root law. 

Definition II gives us a way to "estimate" probabilities from exper¬ 
imental results in a simple way. For instance, from mortality tables compiled 
in past years, we find that for 100,000 new-born white American males, about 
92,300 of them are alive at age 20, This allows us to estimate the probability 
that a new-born white American male will live to be.20 years old as 0,923, 
(Because of the way the data were accumulated, we believe this is "about" right! 
In fact it and many other similarly estimated probabilities of survival are 
used in calculating life insurance premiums,) Similarly, if we have examined 
1000 electric light bulbs manufactured by a certain company, and find 20 de¬ 
fective, we estimate the probability for a bulb of this type ("taken at random") 
to be defective to be 20/1000 = ,02 = 2%, 


Exercise 4,1. 


1, What are the greatest and least values a probability can have? What do 
these extreme values mean? 
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2„ If the probability that an event occurs is m/n, what is the probability that 
it does not occur? 

3, In a roll of one die, what is the probability that the number of dots ob¬ 
tained will not exceed 4? That the number of dots will be an even number? 


4o If we draw a card from a pack of playing cards, what is the probability of 

# 

not obtaining a spade? Of obtaining the nine of spades or the ten of spades? 
Of obtaining a nine or a ten of some single suit? Of obtaining a card with a 
nine or a ten on it? 


5* How would you estimate the probability that a student picked at random from 
a list of all Princeton undergraduates’ is over 6 feet tall? 


6. The following mortality table shows the nuaTiber of survivors to various ages 
of 100,000 new-born white American males: 


Age 

X 

Survivors to 

Age X . . 

0 

100,000 

10 

93,601 

20 

92,293 

30 

90,092 

40 

86,880 

50 

80,521 

60 

67,787 

70 

46,739 

80 

19,360 

90 

2,812 

100 
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(aj Estimate the probability of a new-born infant 
in this class living to be 60 years old, 

(b) Estimate the probability of a 20-year-old in 
this class living until he is 50 years old, 

(c) Make a cumulative polygon of "age at death" 
and from it estimate the probability of a 
25-year-old living to be 75, 

(d) From the cumulative polygon of (c) you would 
estimate the probability to be 1/2 that a 
50-year-old would live how long? 


7, How would you estimate experimentally the probability of getting less than 




^_ 4, ELEMENTARY PROBABILITY _ Sec. 4.2 

3 heads in throwing five coins? Carry out this experiment 60 times and record 
the results. 

8. How would you estimate experimentally the probability of getting a busy 
signal between 4 and 6 P.M., on week-days from a telephone number picked at 
random from the telephone directory? 

9, How would you estimate the probability that the first digit of a number 
picked at random out of the front page news text of a newspaper is a 1, 2, or 
3? Estimate it from 50 such numbers. 


4.2 Probabilities in Simple Repeated Trials . 

We have discussed the probability involved in the toss of one coin. 
Now suppose that we toss two coins. The situation becomes a little more compli¬ 
cated. What, fo-r example, is the probability of getting two heads in tossing 
two coins? In order to solve this problem mathematically we fall back on Defin¬ 
ition I of probability. First, we have to find the number of possible cases 
which can happen when two coins are tossed. This is easily seen to be 4 if we 
actually write down all the possible combinations (H stands for head, T for 
tailj: 


First Coin 
H 
H 
T 
T 


Second Coin 
H 
T 
H 

T . 


Case 1 t 

Case 2 i 

Case 3 : 

Case 4 : 

More simply we can write these as 

HH HT TH TT . 

Hence, the total number of oases is 4. Since it is agreed that it is equally 
likely that either coin will turn up head or tail, it will (usually) be agreed 
that the four combinations in pairs will be equally likely. We must now find 
the number of cases in which the desired event, i.e., that of getting two heads 
can occur. This is the case if, and only if, we have HH, i.e., in just one 
case. Therefore, applying Definition I, the probability we seek is 1/4. Note 
that the agreement about the four combinations of pairs being equally likely 
would not hold if the coins were glued together or even if they were magnetic. 
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It is instructive to work out the probability of getting one head and 
one tail. The only modification now is the number of cases in favor of this 
event. Now we see both HT and TH will do for our purpose, for all we care for 
is a head and a tail, no matter which coin is which. In other words, the first 
coin may turn up head and the second tail, or vice versa, and in either case we 
shall be satisfied. Hence, the nimiber of cases in which the desired event can 
occur is two and the probability we seek is 2/4 or l/2. 

Of course, the probability of getting two tails is the same as that 
of getting two heads, that is, 1/4. Notice that 



and also that the three events we have considered, i.e., 

(i) two heads 

(ii) one head and one tail 

(iii) two tails 

exhaust all the possibilities of the outcome of tossing two coins. It is be¬ 
cause of this fact that their respective probabilities add up to 1, as shown 
above. 

Our next step will be to consider three coins. Here we can write 
down all the possibilities as follows: 

HHH, HHT, HTH, THH, HTT, THT, TTH, TTT. 

It is easy to see that the total number of possible cases is 2 x 2 x 2 » 8, For 
each coin has two possibilities, head or tail: and in combining the possibili¬ 
ties for all the coins we multiply them together, 

TShat is the probability of getting exactly two heads? All we have to 
do is to pick from the above list those combinations with exactly 2 H’s, They 
are 

HHT, HTH, THH 

and the number is three. Hence, the desired probability is 3/8, 

It is easy to check the following results: 

Pr (of getting 0 heads) =1/8 
Pr (of getting 1 head ) =3/8 
Pr (of getting 2 heads) =3/8 
Pr (of getting 3 heads) =1/8 


Sec. 4,.2 


Total 
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Now let us take up the general case of n coins. The number of heads 
we may get is of course some number between 0 and n (both included); call this 
number x. The general problem we want to solve is this; what is the probabil¬ 
ity of getting exactly x heads if n coins are tossed? From our experience with 
2 or 3 coins, it is easy to see that the total number of possible cases is 

2.2.2 (n times) = 2^. But how many cases show exactly x heads? Since 

the numbers n and x are general, we cannot write down all the cases as we did 
in the case of 2 or 3 coins. But even for particular numbers like n = 100 and 
X = 40 it will be too laborious to do so. We must, therefore, find them without 
actually listing them. Here a bit of mathematics comes in, which we shall discuss 
in Section 4.3, 

We have considered a single toss of 1, 2, 3, ... n coins. Instead, 
we may consider 1, 2, 3, ... n successive tosses of a single coin. It is ob¬ 
vious that the situation is exactly the same if one coin is as good as another 
and also one toss as good as another. 

Let us turn to dice. If we roll two dice, what is the number of all 
possible cases? It is easily seen to be 6 x 6 = 36, and we can actually write 
down all the combinations as follows (the first number of the pair refers to the 
first die, the second number to the second die — the dice might be colored 
differently for example): 

(1.1) , (1.2), (1,3), (1,4), (1,5), (1,6), 

(2.1) , (2,2), (2,3), (2,4), (2,5), (2,6), 

(3.1) , (3,2), (3,3), (3,4), (3,5)., (5,6), 

(4.1) , (4,2), (4,3), (4,4), (4,5), (4,6), 

(5.1) , (5,2), (5,3), (5,4), (5,6), (5,6), 

(6.1) , (6,2), (6,3), (6,4), (6.5), (6,6). 

This list gives us the complete information for all questions regarding two dice. 
For example, what is the probability that at least one die shows 6 dots? There 
are 11 favorable oases t 

(1,6), (2,6), (3,6), (4,6), (5,6), (6,6), 

(6.1) , (6,2), (6,3), (6,4), (6,5). 

Thus the answer is; Pr (getting at least one 6) = 11/36, 

As another example, what is the probability of getting a total of 8 
dots? We see that a total of 8 dots can be obtained in the following 5 ways: 

(2,6), (3,5), (4,4), (6,2), (5,3). 

Thus the answer isj Pr (getting a total of 8 dots) = 5/36. 
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Now consider the following problem. If we roll n dice (or, equiva~ 
lently, if ire roll a die n times) what is the probability of getting exactly 
X "sixes”? The problem is seen to be similpr to the one posed for coins and 
will be solved later. 

Before we leave this topic it is instructive to consider the case of 
biased coins and loaded dice. Up to now we have supposed that the coin is 
unbiased, for which reason we agreed that the probability of head (or tail) 
was 1/2, Now suppose that this is not so. Such a case can actually happen if, 
e.g., the coin is poorly made and is heavier on one side, or if the coin is 
worn more on one side than the other. No simple intuitive reasoning will tell 
us what the probability of head is noT®. A practical way of estimating the 
probability would be experimentally, i,e,, to toss the coin a large number of 
times and apply Definition II to the relative frequency of number of heads ob¬ 
tained, Supposo that in 1000 throws of a coin the relative frequency of heads 
turns out to be ,45, then by Definition II, .45 would be considered as an 
estimate of p which would then have to be used as though it were the true pro¬ 
bability of a head, although it would be necessary to keep in mind that all 
we really know is that p has a value somewhere near the value .45. Suppose 

the probability of a head is pj then the probability of a tail is 1 - p , 

With such a biased, coin we can again ask similar questions to those already 
asked about unbiased coins. But it will be seen that we need a new method of 
computing probabilities of this kind. These will be discussed in Chapter 6 in 
the treatment of the binomial distribution . 

One easy my of imitating a "biased coin" is the following; Take 
an ordinary "true" die and mark on two of the faces the letter H and the re¬ 
maining four faces the letter T. Then this die will behave like a biased 
coin. The probability of H (head) is 2/6 = 1/3 and that of T (tail) is 
4/6 = 2/3. 


Exercise 4.2 . 

1. If a dime and a nickel are tossed, what is the probability that the dime 
shows head and the nickel tail? What is the probability that one coin shows 
head and the other coin shows tail? 

2, If three coins are tossed, what is the probability of at least two heads? 


^ U -t-. 
^ /i 1' 

' r H 

’ T T 






2^ 
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Of at most two heads? 


3. Construct a table showing all the possible cases when four coins are tossed. 
Find the probabilities of 0, 1, 2, 3, 4 heads, respectively, 

4, What is wrong with the following argument: "If two coins are tossed, there 
are three possibilities: 2 heads, 1 head and 1 tail, two tails. Hence, the 
probability of 1 head and 1 tail is 1/3" 


5, Two dice are rolled. What are the various total numbers of dots we may 
obtain? List their respective probabilities. What total number has the great¬ 
est probability? What is the probability of obtaining a total not exceeding 9? 


6, Work problem 5, but using three dice instead of two, 

7, Three dice are rolled. What is the probability of getting a pair, (i.e,,. 
exactly two faces alike)? The probability that none of the dice shows an ace? 

8, Imagine we have dice made of the form of a regular tetrahedron (four faces) 
and marked with 1 to 4 dots on the faces. The number of dots on the bottom 
face is the number we get when we toss the die. Now if we toss two such dice, 
what is the probability of getting exactly one ace? At least one ace? Exact¬ 
ly two aces? What are the probabilities of getting the various possible total 
numbers of dots? 


4,3 Permutations . 

In Section 4.2 we ran into the following problem: If n coins are 
tossed, in how many ways can we get x heads? Let us denote head by H and 
tail by T as before. Now if we mark the coins 1 to n and think of them laid 
out in a row, 

(1), C2j, .... (n), 

then each coin may be an H or a T and the problem reduces to finding the number 
of possible arrangements of n symbols composed of x H’s and (n - x) T’s. For 
example, if n = 3, x = 2, we would have the following three arrangements; 


HHT, HTH, THH. 
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For n = 5, X = 3, we have the following 10 arrangements: 

HHHTT, HHTHT, HTHHT, THHHT, HHTTH, 

HTHTH, THHTH, HTTHH, THTHH, TTHHH. 

These arrangements are called permutations , More specifically, in the first case 
we have 3 permutations of 2 H's and 1 T; in the second case we have 10 permuta¬ 
tions of 3 H's and 2 T's, 

The general rule on the number of permutations in any given case may 
be stated as follows: 

Rule on Number of Permutations ; If there are n ob.jects, of which i 
are alike, another j are alike, another k are alike, and so on, then the number 

j, k, o f different pe n rutritions is given by the formula 


(4.1) 


j, k, ...) 


n' 

il jl k' 



where the symbol nl ( read n factorial ) means the product of all integers from 
n 1: nl = n(n-l) ... 3 • 2 • 1. 

For example, in the problem of n coins, we have n symbols, of which x 
are alike (all H) and another n-x are alike (all T). Hence the rule gives us 
the number of permutations 


(4.2) 


_ n J 

xj (n - x)* 


If n = 3, X = 2, we get 


if n = 5, X = 3, we get 


5 ' 

21 1 ’ 


3 ; 


5' 

3i 21 


= 10 . 


We shall begin with the simple case where all n things are different. 

In this case i =j “k= ... =1 and the rule reduces to the following: 

Special Case ; The number of permutations of n distinct objects is 
equal to nl , i.e,, 

(4.3) 1, 1, ...) = n’. . 

As an example of this special case, consider how many different arrange 
ments 5 people can seat themselves on a bench. The answer is 51=5x4x5x2x1 = 120, 
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To prove the rule in Special Case, we imagine n empty "boxes marked 
1, 2, n lined up in a row, and n different objects to put into the boxes, 

one object in each box. 

A box will be filled when it has one object in it. In the first box 
we can put any one of the n objects on handj hence there are n ways of filling 
the first box. Suppose this has been done. Then we are left with n - 1 objects. 
We can put any one of them into the second box. Hence there n - 1 ways of 
filling the second box after the first one has been filled. But each way the 
first box can be filled can be combined with each 7ra.y the second can be filled. 
Hence there are n(n - 1) ways the first two boxes can be filled. For the third 
box, there are n - 2 objects to choose from. Hence, there are n(n ~ l) (n - 2) 
’ways the first three boxes can be filled. We can continue this reasoning down 
to the last empty box. Then there is only one object to select, and hence only 
one way to fill the last box. Therefore, our reasoning leads us to the conclu¬ 
sion that there are 

(n)(n - l)(n - 2)...(2)(l) = n’ 

different ways of filling the n boxes with the n different objects. 

Now, if some of the things are alike (i.e,, indistinguishable from 
each other) we shall have fewer ways of arranging them into distinguishable ar¬ 
rangements. For example, suppose we have 3 H's and 2 T’s; for a moment let us 
mark them as H^, Hg, H^, T^, T^, although in reality H^, H^, and H^ are indis-, 
tinguishable among themselves, and so are T^ and Tg. We have just seen that the 
total number of permutations, regarding the H's and T’s as all distinct, is 5J 
Now take any permutation like 


%'^1^2®2®3 


If we permute the three H’s among themselves while leaving the T's alone, we get 
altogether 3.’ = 6 possible arrangements j 

=1 h ’'e ^2 ^2 h ^2 “3 

h h "2 ®3 ^2 H3 T2 

^2 h ^2 % 33 ^^3 h ’^2 ^2 % • 

Now if the 3 H's were indistinguishable, i,e,, if we erase the subscripts on the 

H’s, all of these 6 permutations would be identical, and would be HT^TgHHH. 
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Thus the number of distinguishable permutations is only l/6th of the original 
number of permutations when the H's were different. Similarly, owing to the 
fact that the T's are indistinguishable when the subscripts are dropped, the 
nimiber of permutations is again reduced in the ratio of 2 to 1. Altogether, the 
number is reduced in the ratio of 31 x 21 or 6 x 2 = 12 to 1. Thus the reduced 
number of permutations (.which are distinguishable) is 

51 

3] 2! • 


You should have a firm grasp of this simple case. Then the following 
general argument will be easy to understand. 

The total number of permutations of n objects when regarded as all 
distinguishable is nl, Now suppose i of the objects are made indistinguishable. 
If we take any particular permutation of the n objects and permute i particular 
objects among themselves keeping the positions of the remaining n - i objects 
fixed, we get (by the previous reasoning) 11 permutations. These i’ permutations 
become indistinguishable when the i objects are made identical — thus the iJ 
permutations collapse into one distinguishable permutation when these i objects 
are made indistinguishable among themselves. Thus the number of permutations is 
reduced in the ratio of ij to 1. Similarly, if we make another j objects alike 
the number is further reduced in the ratio of ji to 1, and so on. Thus the final 
formula becomes, after the successive reductions? 

nl _ 

il jl k« ... • 

Consider an example. How many different permutations can we get from 
the word ^statistics”? In other words, in how many ways can we scramble the 
letters in "statistics” and obtain arrangements which are distinguishable from 
one another? Let us break it down to the component letters: 


a, 0 , i, i, s, s, s, t, t, t. 

There are 10 letters, among which 1 is alike, another 1 is alike, another 2 are 
alike, another 3 are alike, another 3 are alike. By formula (4.1) the number of 
permutations is equal to 


IQJ 

1' II 21 31 31 


50,400. 


Another problem of permutation is the following. Suppose we have n 
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distinct objects. Instead of permuting all of them let us choose x out of them 
and then arrange those x objects in all possible orders. The number of different 
ways of doing this is given by the rule below. 


Rule on Nomber of Permutations of x out of n: The number of ways of permuting; x 


objects chosen from n distinct obieotS is given b^' 


(4.4) 



The reasoning in establishing this rule is similar to that used for 
the previous rule on permutations, Fe imagine x boxes numbered 1 to x, and n 
objects, all distinct. We want to fill these boxes by x objects chosen from the 
n objects, one in each box. The first box can be filled in n ways, the second 
in n - 1 ways, and so on. Since there are x boxes we get x consecutive factors 
starting with n and going down to (n - x + l), that is 

n(n - l) (n - X + l). 

Now let us multiply this number by (n - x)(n - x ~ l) ,,. (2)-(l), that is 
(n - x)i, and then divide by it. We obtain, finally 



Example ; The number of ways of arranging two different letters out of the word 
town (i,e,, the number of permutations of 2 out of 4) is 


The arrangemeiits are • 


to, ot, tw, wt, tn, nt, ow, wo, on, no, wn, nw, 


Exercise 4,5 . 

1, In how many different ways can 10 men line up in a single line? How many if 
a specified man has to stand at the left end? How many' if a specified man must 
stand at the left end and a specified man at the right end? 

2, How many num-bers can we obtain by rearranging the digits in the number 243402? 
How many if numbers beginning with 0 are exclu^ded? 




Sec, 4,4 


4. ELElffiOTARY PROBABILITY 


75 


3, How many 4-letter code ’’words" can be made from the letters of the alphabet 
if no repetition is allowed? If any letter can be repeated any number of times? 
If the first letter has to be w and no repetition is allowed? 

4, In how many ways can 20 students seat (arrange) themselves in a room which 
has 30 seats? 

5, In how many different orders can four people seat themselves around a round 
table? In how many orders can they seat themselves along a bench? 

’ 

■ . -V 

■ ■; f 


4.4 Combinations . 

Now we come to combinations which are more important in statistics 
than permutations . Suppose we have n (distinct) objects and we want to choose 
X from them without paying any attention to the arrangement of the x objects 
chosen. The number of ways of doing this is given by the following rule: 

Rule on Nimiber of Combinations ; If there are n different objects, the numbe r 
of ways C^ of selecting x out of the n objects is given by the formula 


C^ is called the number of combinations of x objects out of n. 

If we were to consider the permutations of x out of n we would have 

n n ^ 

p = --permutations. We know by the rule of permutation that each 


choice of x objects would give rise to xl different permutations. But all of 

these xl different permutations are made up of a single combination of x objects. 

ni 

Hence, one can break the permutations of x out of n into sets of xl 

permutations, the permutations in each of these sets being the same (one) combi¬ 
nation of objects. Hence the number of different combinations of x objects out 
of n is 


0 ^ = -■ . 

X xl (n - x)i 

You will note that P^^^(x. n-x) is the same as G^. 

--- > ----- X 
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Example i Three men are to be chosen from 5 men; in how many ways can they be 


selectee 


The answer is 


If we label the 5 men A, B, C, E, we can write down all the possibilities: 
ABC, ABD, ABE, ACD, ACE, ADE, BCD, BCE, BDE, CDE. 


Now we can solve the problem about n coins raised in Section 4,2, namely: 
If we toss a coin n times, what is the probability that we will e;et exactly x heads? 
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4 .41 Binoinial coefficients . 

The quantities are also called binomial coefficients, for they may 
be obtained by expanding a binomial expression raised to the n-th power. Thus, 
consider the binomial expression (T + H)^, We have 


(T+H)” 


(4.7) 


on jjl jn-1 


C” + C“ t”'® 

Ct o 


c" if t’ 


Note that the general term in this ex['ansion is T^~^; if we call H the head 

of a coin and T the tail, then we may regard this term as telling us symbolically 
that ’’there are ways in which n coins can fall so that there arc x heads and 
n~x tails”. For x = 0 we have the first term T^, which states that there is only 
one way the n coins can fall so that we get 0 heads and n tailwS; for x = 1, we 
have the second term ^ which states that the n coins can fall in ways 

so that we get 1 head and n-1 tails; and so on, 

1 1 

If we put H = and T = ^, we get 


(i. i)'* 
'2 2 ' 


(|)" 




... - C“-(f)" 


(t". 


The general term now gives us the probability that x heads (and n-x 

tails) will appear in throwing n coins. The first term is the probability 

of getting 0 heads and n tails; the second term (~) ^ is the probability 

of getting 1 head and n-1 tails; and so on. 


Table 4,2 shovrs the values of C for all values of n from 0 to 10: 

X 

note that the entry for any given row can be constructed from, the row imjnediately 
above it by adding the num.ber above that entry to the number’s left-hand neigh¬ 
bor, For example, the entry 126 (n = 9, x = 4) is found by adding 70 and 56, 


r S ^ y ?-n : a V w » 
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TABLE 4>2 

Binomial Coefficients for n = 0, 1, 2, 5, 4, 5, 6, 7, 8, 9, 10. 



X 


n 

0 

1 

2 

3^ 

4 

5 

6 

7 

8 

9 

10 

Total 

0 

1 











1 

1 

1 

1 










2 

2 

1 

2 

1 









2^ 

3 

1 

3 

3 

1 








2® 

4 

1 

4 

6 

4 

1 







2^ 

5 

1 

5 

10 

10 

5 

1 






2® 

6 

1 

6 

15 

20 

15 

6 

1 





2® 

7 

1 

7 

21 

35 

35 

21 

7 

1 




2'^ 

8 

1 

8 

28 

56 

70 

56 

28 

8 

1 



CO 

9 

1 

9 

36 

84 

126 

126 

84 

■ 36 

9 

1 


2^ 

10 

1 

10 

45 

120 

210 

252 

210 

120 

__ 1 

45 

10 

1 

2“ 


Exercise 4.4 . 

1. In how many ways can the members of a committee of three be chosen from 7 men? 

2. In how many ways can a committee of three men and two women be selected from 
5 men and 3 women? Solve this problem by formula first and then verify your re¬ 
sults by enumeration, 

3. If 5 coins are tossed, what is the probability of getting 0, 1, 5 heads 

respectively? Tllhat is the probability of getting less than 3 heads? Of getting 
3 or more? Of getting at least 2 heads and at least 2 tails? 

4. If we want to pick r objects from n objects so that one particular object 
must always be included, in how many ways can this be done? Answer the same 
question replacing "included^ by "excluded". 


5. Generalize the previous problem; In how many ways can we pick r objects from 
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n objects so tha^t s(s < r) particular objects must always be included? So that 
s particular objects must always be excluded? 

6. In how many ways can 17 cfSins fall so that exactly 10 heads show? So that 
at least 3 heads show? 

7. Show that 

X n-x 

8. Draw a frequency histogram for the number of ways of getting 0, 1, 2, 3, 4, 

5, 6, 7, 8, 9, 10 heads in throwing 10 coins, 

9. Ifhat number of heads can be obtained in the most ways when throwing 42 coins? 
43 coins? n coins if n i-s even? n coins if n is odd? 

10. By looking at Table 4.2, write down the values of for x = 0, 1, 2, 3, 4, 
5, 6, 7, 8, 9, 10, 11. 

11. In how many different ways can a hand of 13 cards be selected from a pack 
of 52 playing cards? 

12. How many different luncheons consisting of soup, an entree, two vegetables, 
dessert and beverage can be ordered from a menu which lists 3 soups, 5 entrees, 

4 vegetables, 7 desserts and 3 beverages? 


4.6 Calculation of Probabilities . 

There are several simple rules for calculating probabilities which will 
be stated and illustrated in this section. 

If E, F, ... are events , we often want to know the probabilities of the 
following events derived from them: 

the event ”not E” (i.e., ”E does not occur") 

the event ”E or F" (i.e., "either E or F or both occur") 

the event "E and F" (i.e., "both E and F occur"). 

For example, if we roll a die, E may be the event that "we get a 6", F 
may be the event that "we get a 5", Then the event "not E" is simply that of 
not getting 6, i.e,, any of the faces 1 to 5. The event "E or F" is that of 
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^getting either 5 or 6". In this example the event "E and F*’ is impossible, be¬ 
cause we cannot get 5 and 6 at the same time. The following example will illus¬ 
trate this case better. Suppose we roll a die twice. Let E now denote "the 
event that we get 6 in the first roll" and F that "we get 6 in the second roll". 
Then "E and F" possible, and means we get 6 in both the first roll and the 
second roll. 


4.51 Complementation, 


Rule I: Complementation. Suppose E is an event. Then 


Fr (not E) = 1 - Pr(E). 


For if there are n equally likely cases and E occurs in m cases, then 
the event "not E" occurs in the remaining n - m cases. Hence 


Pr(not E' 


- - = 1 - ~ = 1 . Pr(E). 

n n 


To illustrate the application of Rule I, let us consider an 

Example ; Three coins are tossed. What is the probability of getting at least 
one head? 

If we denote the event of "getting at least one head" by E, then "not E" 
is the event of "getting 0 heads" (i.e., 3 tails). Hence by Rule I 

Pr (getting at least one head) = 1 - Pr (getting 0 heads) 

= 1 - Pr (getting 3 tails) 



4,52 Addition of probabilities for mutually exclusive events . 

Rule II: Addition . If two events E and F are mutually exclusive (i.e., they 
cannot occur together)j then 


Pr(E or F) = Pr(E) + Pr(F). 


For, suppose there are n equally likely cases such that E occurs in 
r of them, and F occurs in s of them.. Then, since E and F do not occur together, 
there is no overlapping between these r + s cases. The event "E or F" occurs 
in these r + s cases and these only. Hence 

Pr(E or F) = — — = ^ ^ = Pr(E) + Pr(F). ' 

n n n 
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Similarly we see that if there are several events, E, F, G, . 
mutually exclusive, we have 


(4.10) 


Pr(E or F or G or ...) = Pr(E) + Pr(F) * Pr(G) 


Let us illustrate Rule II by the following 

Example : Suppose a card is drawn at random from a pack of playing cards. What 
is the probability that it is either a "heart" or the "queen of spades"? 

E is the event "heart" - F is the event "the queen of spades", ITiey 
are mutually exclusive because a heart cannot be the queen of spades. Thus 
Rule II is applicable. Since there are 13 hearts 


Pr(E) = 


52 ‘ 4* 


Since there is just one "queen of spades". 


Pr(F) = 


Pr(E or F) = Pr(E) + Pr(F) 


± = ii 

52 52* 


4,53 Multiplication of probabilities for independent events . 

Now suppose there are n equally likely cases and m favorable cases for 
the event E; N equally likely cases and M favorable cases for the event F, so that 

Pr(E) = ^ 

'Pr(F) = I . 

If each of the n cases when we consider E can be combined with each of the N 
cases when we consider F, so that it can be agreed that the nN combined cases for 
the joint event "E and F" can be considered equally likely, then we say that the 
two events E and F are independent . 

Rule III: Multiplication . If E and F are independent in the sense just defined. 


(4.11) Pr(E and F) = Pr(E).Pr(F) . 

Since each of the n cases when considering E can be combined with each 
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of the N cases when considering F, we have altogether nN possible cases when we 
consider the joint event "E and F", Each favorable case for E combines v/ith 
each favorable case for F to. give a favorable case for the joint event "E ana F". 
Thus^ there are mM favorable cases for the event "E and F". Hence 

Pr(E and F) = ^ ‘ ' P>-(E).Pr(F) . 

Similarly, we see that if there are several events, E, F, G, ..o, all 
independent, we have 

(4.12) Pr(E and F and G and .c,) = Pr(E)»Pr(F)•Pr(G) ... . 


Rule III can be illustrated by the following 

Example : From a pack of playing cards two cards are dravfn at random successively, 
the first being replaced before the second is dravmo What is the probability that 
the first is a "heart" and the second not a "king"? 

E is the event: "The first cahd drawn is a heart"; F is the event: 

"The second is not a king". We have 


Pr(E) 


= 1 

52 4 • 


Notice that the event F is of the form "not king". Since there are 4 kings, the 

4 

probability of "king" is*;^ ; by the rule of complementation, 

o ^ 


Pr(F) * Pr(not king) = 1 - Pr(king) * 1 


±. 

52 " 13 • 


Since the first card is replaced before the second drawing, any possi¬ 
bility for the first drawing can be combined with any possibility for the second. 
Thus, we can apply Rule III and obtain 

Pr(E and F) = Pr(E).Pr(F) = 4 ^ ^ • 


Som.etimes we have to apply a rule similar to Rule III in evaluating 
the number of favorable cases in making a probability calculation; in other words, 
we apply a multiplication procedure in evaluating the number of favorable cases. 
An example will make this clear. 


Example ; In a gamie of bridge what is the probability of a hand of 13 cards con¬ 
sisting of 5 spades, 3 hearts, 3 diamonds and 2 clubs? 
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The total number of cases n is the number of ways of picking 13 cards 
from 52, This is a problem in combinations. We have 


n = 


C 


52 

13 • 


The number of favorable cases m is obtained as follows: We want to pick 5 spades 

13 

out of 13 spades; the nimiber of ways of doing this is C by the Rule on number 

o 

of combinations in Section 4.4, Similarly, the numiber of ways of picking 3 hearts 

13 13 

out of 13 is C , and the member of ways of picking 2 clubs out of 13 is C ^ . 

The combined number of ways of doing all these things jointly is obtained by mul¬ 
tiplying the four numbers, i.e,. 


m 


_ 13 13 13 13 

~ ^5 


This is the number of favorable cases m. Thus applying the definition of proba¬ 
bility, we have 


Pr(5 spades, 3 hearts, 3 clubs, 2 spades) 


m 

n 



•C 



13 

3 


•G 


13 

2 


13: 
5i 8’ 


13 ’ 131 13 : 

3 : iQi * lo: '21 111 

521 

391 13 ' 


59' (131) 


2' (30^ 5’ 8.H10’)^11' 52' 


4,54 iViultiplice.tion of probabili^es when events are not independent] conditional 
probabilities . 

We often have situations in which we have to calculate the probability 
of the joint occurrence of two events E and F when they are not independent. If 
a trial can result in the occurrence of E or ’’not E” and can also result in the 
occurrence of F or ’’not F”, then the result of a trial will belong to one and 
only one of the four classes: E and F; E and ’’not F”; ’’not E” and F; ’’not E” and 
’’not F”, If n , ^ 9-1 ^ ^99 numbers of cases favorable to these four 

classes respectively, where n^^ + n^^ ^ 21 '*' ^22 ~ total number of possible 

cases, we can display the situation in the four-fold table as shown in Table 4,3. 


Now the probability of the occurrence of E and F is 

Pr(E and F) ^ , 


n 
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But notice that this can be written as 

^ /T. . r.^ * ^ 21 ^ , 

Pr(E and F) = (---) • (--——) . 

^11 ^^21 

Now probability of event assuming that event F has oc¬ 

curred . We write this as Pr(.E(P)^ read ’’probability of E given F”, and is 
called the conditional probability of the occurrence of E, given that F has oc¬ 
curred . The ratio is clearly the probability of the occurrence of 

F, i.e., Pr(F), Therefore, we have a rule for multiplication when the events are 
not independent, ./^ 


E 

Not E 

Total 


TABLE 4.5 

Four-fold Table / y 

Not F Tgtal 




"11 

^12 

"lf"l2 

"21 

^22 

^2l“'^22 

"ll*"21 

\2‘*'^22 

n 


If E and F are not independent, then the probability of the .joint 
^^ ^currence of E and F is given by the formula 

(4,13) Pr(E and F) = Pr(F).Pr(E(F) . 

It should be noted that if Prl^ElF) = Pr(E|”not F”) then each of these quantities 
will be equal to Pr(E) (1,6., to this case the numbers in the 

columns (and also rows) in Table 4,3 will be proportional to each other and we 
have a situation in which we have independence , and in which Rule IV reduces to 
Rule III. 

Similarly, if we have three events E, F, G, which are not independent. 


we have the formula 
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(4.14) 


Pr(E and F 


and G) = Pr(G)-PrlFlG).Pr{E(F and G)^' . 


This can be extended to any number of events. 

If we return to the formula given in Rule IV, we can write (assuming 
Pr(F) is not zero) 


U.15) 


Pr(E|F) 


Pr(E and F) 




which gives us a way of calculating conditional probabilities. We shall consider 
an example to illustrate Rule IV, 

Exajnple ? Suppose three bad light bulbs get mixed up with 12 good ones, and that 
you start testing the bulbs one by one until you have found all three defectives. 
What is the probability that you will find the last defective on the seventh 
testing? 

Let F be the event of ^finding 2 defectives among the first 6 tested" 
and E be the event of "finding the third defective on the seventh testing". Now 
finding Pr(F) is just a combinatorial problem, i.e.. 


Pr(F) 


•C 


15 

6 


12 

4 . 


s/ 


Pr(EjF) is the probability of finding the third defective on the seventh testing 
after event F has happened. When F has occurred, we know that there are 9 bulbs 
left and that one is defective. The probability of picking it on the seventh 
testing (the first after F has occurred) is the desired probability Pr(E|F), i.e,. 


Pr(ElF) = I . 


Hence 

Pr(E and F) 

which is the desired probability. 


.15 • 


1 

9 


-i. 

91 ’ 


4.55 Addition of probabilities when events are no t mutually exclusive . 

There are situations in which we need to find the probability of the 
occurrence of "E or F" when E' and F are not mutually exclusive. In this case 
we have the following rule: 
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If E and F are events which are not mutually exclusive, then the pro- 


bability of the occurrence of E or^ F is given by the formula 
(4,16) Pr(E or F) = Pr(E) + Pr(F) - Pr(E and F). 

The truth of this rule is clear if we look at Table 4.3. Remember that the event 
"E or F" means that "either E or F or both occur". The number of cases in 
Table 4.3 in which "either E or F or both occur" is seen to be n^^ + n^^ + , 

Hence n +n +n 

Pr(E or F) = It _^ 

n 

* 11 12 + 11 21 - 11 


= Pr(E) + Pr(F) - Pr(E and P) . 

Ilotice that n^^ is the number of cases in which both E and F occur. Hence, 

Pr(E and F) =0^(i.e., = O), then E and F would be mutually exclusive . There¬ 

fore Rule Y would reduce to 


Pr(E or F) = Pr(E) + Pr(F) 


which is Rule II. 


In the case of three events E, F, G, which are not mutually exclusive, 
the event "E or F or G" means that "either E or F or G, any two of them or all 
three occi.r". In this case Rule V becomes 


(4.17) 


Pr(E or F or G) = Pr(E) + Pr(F) + Pr(G) 

- Pr(E and F) - Pr(E and G) - Pr(F and G) 
+ Pr(E and F and G) . 


To illustrate Rule Y, let us consider the following 
Example ; If a card is dealt from a pack, what is the probability it will be an 
honor card or a spade? 

Let E be the event "an honor card" and F the event "a spade", then "E 
or F" is the event "an honor card o£ a spade or_ both" and "E and F" is the event 
"an honor card and a spade". We have 

PrtE) = 11 , PrCF) = H , Pr(E and F) , 
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Not 


and therefore 


Pr(E or F) 


20 + ii X. 

52 52 “ 52 

a 

52 • 


The nurabers of cases involved in this problem may be represented in a 
table similar to Table 4.3 as follows* 


TABLE 4,4 

Table Showing Glassification of Cards in a Pack 
with Respect to Honor Cards and Spades 


F 



£ 

(an honor card) 

Not E 

(not an honor card) 

Tdytal 

F (a spade) 

5 

8 

13 

(not a spade) 

15 

24 

39 

Total 

20 

32 

52 



4.56 Euler diagrams . 

An effective schematic representation of the various combinations of 
events and their complements, and the probabilities (or percentages of oases) 
associated with them in case the events are not mutually exclusive, is provided 
by Euler diagrams. For convenience let us use the following shortened notation* 

E = "not E" 

E+F * "E or F" 

E-F * "E and F" 

with similar meanings for P, E+F, £•¥, etc. In this notation we sometimes refer 
to E, E, E+F, E*F, etc,, as classes as well as events. 

We may then represent the various possible combinations of events given 
in Table 4,3 by the Euler diagram in Figure 4.1, 





Euler Diagram for the Information in Table 4.3 
Figure 4.1 

The possible events are E*F, E*F, E*F and E-F and they are represented by the 
regions into which the rectangle is divided. The numbers of cases favorable 
to these various events are given in the parentheses. We may then write re¬ 
lations among the various events in an algebraic form: 


E = E*F + E‘F 
F = E‘F + E*F 
E + F = E*F + E-F* + ¥«F 

from which we can write down the probabilities: 

Pr(E) = = !!il 

n n 


Pr(F) 


11 


21 


Pr (E+F) 


^11 ^12 ^21 


"'ll "^12 + ^11 ''21 - "ll , 
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or more briefly^ 

Pr(E+F) = Pr(E) + Pr(F) - Pr(E-F) , 

which is simply another way of writing formula (4.16) using the shortened notation. 

The Euler diagram for the example illustrating Rule V is shown in 
Figure 4,2, In the Diagram E‘F is the class of cards in which each card is an 
E (honor card) and an F (spade). E-F also refers to the event of drawing a card 
which is an E and an F. Similar interpretations apply to E‘F, E*F, and E‘F. 



Euler Diagram for the Information in Table 4.4 
Figure 4.2 


The Euler diagram is more useful in the case of three events E, F, G 
and their complements E, F and G. For we have in this case the Euler diagram 
shown in Figure 4,3. 

Algebraically, we may ynrite 


E = E-F*G + E-F-G + E-F-G + E-F-G 


and 

n n + n + n 

Pr(E) = 111 112 121 122 

n 


with similar .expressions for F, G, Pr(F) and Fr(G)o (Note that event E is 
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represented by all regions in the circle marked (E), and similarly for event F 
and event G-.) 

For the event "E or F" (i.e., E+F) we take all regions falling in the 
two circles marked (E) and (F), i.e.^ 

E + F = E-F-G + E-F'G + E-F-G + E-F-G 

E-F-G + E-F-G 

and we have 

^111 ^112 ^121 ^122 ^211 ^212 
n 

^ 112 ^ ^lZl'^^122 ) + ^ 112 '*' ^ 211 '*' ^ 212 ) 

n n 

-( ^ 112 "*^ ^ 111 ) 
n 


Pr(E+F) - 


= Pr(E) + Pr(F) - Pr(E-F) . 



Euler Diagram for the Occurrence of Various Combinations 
of Events E, F, G and their Complements 


Figure 4,3 
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Similarly, the event E + F + G is represented by the "sum” of all 
regions included anywhere in one or more of the three circles. By writing dovm 
the n’s and grouping them properly we find that 

Pr(E+F+G) = Pr(E) Pr(F) Pr(G) 

- Pr(E-F) - Pr(E.G) - Pr(F.G) 

+ Pr(E*F*G) , 

which is formula (4,17) written in the shorter notation. 

In applications, it is usually convenient to refer to the event or class 
E*F*G as "E only” and E-F'G as "E and F only”, and similarly for other events or 
classes of these types. The event or class E, it must be remembered, consists of 
all four classes inside the circle marked (E). 

Consider the following example involving an Euler diagreun with three 
classes and their complements: 

Example ; Among the children in a certain school 

10^ have defective eyes 

have defective hearing 
12% have defective teeth 
4^ have defective eyes and teeth 
Z% have defective hearing and teeth 
2%o have defective eyes and hearing 

have defective eyes and hearing and teeth. 

Construct an Euler diagrajri for these data. 

Let us refer to defective eyes as class E, defective hearing 
as class F and defective teeth as class G. Then we have 
lCf% belong to E 
B% belong to F 
12^ belong to G 
4:% belong to E-G 
Z%> belong to F’G 
Z%o belong to E*F 
1% belong to E*F*G 

The Euler diagram is as follows : 
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From this figure we see that 5^ of the children belong to E»F*(j, i.e. have de¬ 
fective eyes only (do not have defective hearing or teeth). Similar interpreta¬ 
tions hold for other classes. 


4,57 General remarks about calculating probabilities . 

In many probability problems such as those in poker, bridge and other 
card games, it is often simpler to go back to the definition of probability 
(Definition l) and compute the numbers of favorable and possible cases as in 
the original definition of probability, than to try to make numerous specific 
applications of the foregoing rules. Let us consider a typical example. 

Example : What is the probability of getting "four of a kind" in a poker hand? 

This means that of the five cards in a poker hand four are of the same 
kind and the remaining one arbitrary. There are 13 different "kinds", and we 
can pick any one of them. For the remaining card we have 52 - 4 = 48 possibil¬ 
ities. Hence 

m = 13*48 


and 


,52 


Hence the required probability is 
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m ^ 11.48 ^ 

n ^52 521 , 

^5 

Exercise 4.5 . 

fin a pack of playing cards, the honor cards are understood to be the ace, king, 
queen, jack and ten of each suit, 

1. If a die is rolled twice, what is the probability that the first roll, yields 
a 3 or 4, the second anything but 3? t 17 ^ ^ V ^" - - 

2o A bag contains 6 black and 4 white balls. Three balls are drawn, lhat is 
the probability that 2 are black and 1 is white? That 1 is black and 2 are white? 

3, If four people stand in a row, what is the chance that 2 particular persons 
designated in advance are next to each other? That they are not next to each 
other? 

4, What is the probability that a bridge hand will consist of 5 spades, 4 clubs, 

3 diamonds, 1 heart? Of 2 aces, a king, a queen, 2 tens and no other honor cards? 

5, In a roll of 6 dice, what is the probability of getting exactly 5 faces alike? 

6 , If we draw two cards from a pack of playing cards, what is the probability 
that they form, a pair (two cards with the same nimiber)? If we draw two cards 
tvdce, the first two being replaced before the second two are drawn, what is the 
probability that we get a pair in the first dramng and exactly one ace in the 
second? 

7, What is the probability of getting a ’’full house” (three cards of one kind and 
two of another) in a poker hand? 

8 , A bag contains 3 white and 2 black ballsj another contains 2 white and 1 black 
ball. If we choose a bag at random and then draw a ball from it, what is the 
probability of getting a white ball? If all balls are poured into one bag and a 
ball is drawn, what is the probability of getting a white ball? 

9, Six men and their wives play three tables of bridge. If the men are paired 
with the women by drawing score cards, what is the probability that each man will 
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draw his wife for his partner? 

10, What is the probability that the birthdays of three particular students will 
fall on three different days of the year? That at least two of them will have 
birthdays on the same day of the year? 

x 

11, The probability of a man aged 60 dying within one year is .025, and the pro¬ 
bability of a woman aged 55 dying within one year is .011, If a man and his wife 
are 60 and 55 respectively, what is the probability of their both living a year? 

Of at least one of them dying within a year? Of at least one of them living a 
year? 

12 , A buyer will accept a lot of 100 articles if a sample of 5 picked at random 
and inspected contains no defectives. What is the probability that he will ac¬ 
cept the lot if it contains 10 defective articles? 

"*'''13, A man has two Indian pennies and two Lincoln pennies in each of two pockets. 
Under which of the following conditions is the probability of getting two Lincoln 
pennies highest: (a) draw one penny from each pocket, (b) draw two pennies from 
one pocket, or (o) put all 8 pennies in one pocket and draw two pennies? Work 
out the probability for each case. 

14. If we draw 3 cards from a pack, each time replacing the card drawn before the 
next drawing, what is the probability that at least one of the cards drawn is a 
spade? 

15. If a person starts dealing off cards from a pack of playing cards, what is 
the probability that the 48th card dealt is the last red one? That the xth card 
dealt is the yth heart to be dealt? 


16. If a card is drawn from a pack of playing cards, what is the probability it 
will be "an ace or a spade or an honor card"? 


^17. A bowl of 25 beads painted with red and green luminous and non-luminous 
paint has the following composition: 



Luminous 

Non-Luminous 

Total 

Red 

6 

12 

18 

Green 

1 

6 

7 

Total 

7 

18 

25 
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(a) If a bead is dravm, what is the probability it is 
green or luminous or both? 

(b) If a bead is drawn at night and is seen to be non- 
luminous, what is the probability it is red? 

(c) If a bead is drawn in daylight and is noted to be 
red, what is the probability it is non-luminous? 

(d) Make an Euler diagram for the composition of beads 
in the bowl. 

18, In a certain population, the percentages of persons reading magazines A, B, 
C and various combinations are as follows: 

A : 9.8?^ A and B : 5.1^ 

B s 22.9?^ A and C t 3.7^ 

G : 12.1?^ B and C : 6.0^ 

A and B and C i 2A% 

(a) Make an Euler diagram from these data. 

(b) What percent of the population read at least one 
of the3 magazines? 

(c) What is the probability that a person taken at 
random from this population would be a reader of 
A or B or both? 

(d) Of those persons reading at least one of the mag¬ 
azines, what percentage read at least two magazines? 

(e) What percentage of the population read; 

A only; A and B only; 

B only; A and C only; 

C only; B and C only? 

4,6 Mathematical Expectation . 

Suppose E , E„, ..., E are mutually exclusive and exhaustive events, 

J. b i£ 

and that P^, P^, .,,, P^ are the respective probabilities of these events oc¬ 
curring. Suppose a person. A, receives an amount of money M^ if E^ happens, an 

if E^ happens. Then A’s mathematical 
expectation of gain or mlinings E(m) is defined as 


amount M^ if Eg happens, an amount 


E(M) = + MgPg 
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If A pays an amount of money equal to + ^2^2 

we say A pays a fair price for this expectation. This concept of mathematical 
expectation and fair price enters into all gambling systems and insurance plans. 
More specifically, if any system of gambling is looked at from the point of view 
of an individual gambler, we find him paying a sum of money for the privilege of 
receiving a sum of money or nothing, depending on the outcome of the game. For 
example, an Irishman buys a lottery ticket for 10 shillings, knowing that the 
probability of getting a large prize of 30,000 pounds is very small and the pro¬ 
bability of getting nothing is very large. A person aged 21 pays an |18 premium 
to a life insurance company for a one-year term insurance policy in return for a 
guarantee that the company will pay his beneficiary $1000 if he dies during the 
year (and nothing if he livesi)* 

Let us illustrate by the following 

Example : Suppose A tosses two coins and receives 2 dollars from B if two heads 

appear, 1 dollar if one head appears and nothing if no heads appear. How much 

should A pay B in advance for the privilege of playing this game? 

It will be seen that three events are involved; the event of 2 heads, 
the event of 1 head and the event of 0 heads. The probabilities of these events 
are 1 / 4 , 1/2 and 1 / 4 , respectively. If this game were played a large number of 
times, A would receive 2 dollars in about l/4 of the trials, 1 dollar in about 
1/2 of the trials and nothing in the remaining trials. Thus he would receive in 
the long run an average of 2 x l/4 = l/2 dollar pe^ trial from the first kind of 

event (2 heads), an average of 1 x l/2 « l/2 dollar for the second kind of event 

(1 head), and nothing from the third. Or his average winnings per trial in the 
long run are 

2 x 7 -+ 1 x ^=1 dollar, 

4 2 

The mathematical expectation of A’s gain or winnings is 1 dollar, A should pay B 
1 dollar for the privilege of playing this game if it is to be a fair game. 

Exercise 4,6 . 

1, If A received 1 cent for every dot that appears in throwing two dice, what 
would be the fair price for playing this game? 

2. A throws four of his pennies. If he obtains more than two heads he receives 
a dime for each head and also keeps his pennies. Otherwise he forfeits his 
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pennies to B. How much should A pay B for every play in order to make this game 
fair? 

3, A pays B one dollar and three dice are rolled. A receives 2 dollars if an 
ace appears, 4 dollars if 2 aces appear, and 8 dollars if 3 aces appearj otherwise 
he gets nothing. Is this a fair game? If not, how much should A receive for 3 
aces to make it fair? 

4, How much should a person pay to receive one dollar if he gets a face card at 
least once in cutting a deck of cards 3 times and nothing otherwise? 

5, The probability that a man aged 50 will live another year is .988. How large 
a premium should he pay for a |1000 term life insurance policy for one year (ig¬ 
noring insurance company charges for administration, etc,)? 

6 , Suppose A bets you 5 cents against x cents that two persons picked at random 
will have birthdays in the same month. Assuming a birthday to be equally likely 
to fall in any month, what should be the value of x for this to be a fair bet? 


4,7 Geometric Probability . 

There are elementary probabidity problems involving position of an object 
in a region which require an extension of Definition I of Section 4,1. 

For example, suppose a box 12 inches square has a bottom made of 3-inch 
boards such that 3 cracks between boards are visible. If a penny is put in the 
box and shaken around, what is the probability that when the shaking stops, the 
penny will not overlap a crack? 

In this problem, consider E as the event of not overlapping’a crack. 

We consider the center of the penny as a point and find the areas of two regions: 

(1) the area C of the region R in which the center of the penny could fall, and 

(2) the area Cg of the region in which the center of the penny must lie so 
that no part of the penny will overlap a crack. Then if it could be agreed that 
it is equally likely for the center of the penny to fall anywhere in C, we would 
define the probability of event E as the ratio of area Cg to area C. The radius 
of a penny is 3/8 ”. The region R in which the center of the penny can fall is 

a square 11 l/4” on a side; its area C is 126 9/l6 sq. inch. The region R, 

11 1 ^ 
sists of 4 rectangles 2 ~ by 11 inches; its area is 101 ^ sq. inch. The 


con- 
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probability of the occurrence of E is therefore 


101,25 

126.5625 


. 8 . 


We could give other tvfo-dimensional examples in geometric probability. 
Also one-dimensional and three-dimensional examples. These suggest that we can 
usefully make the following extension of Definition I to cover situations in 
geometric probability. In the definition we will consider the word content to 
mean length, area or volume, of a region depending on whether the definition is 
applied to problems involving one-,two-or three-dimensional regions. 


Definition of Geometric Probability . If an event E can happen by the occurrence 
of a point in a region within a region R, all points in R being considered by 
mutual agreement to be equally likely, then the probability of the event E is 
defined as or 

Pr(E) = Cg/C 


where 


is the content of R and C is the content of R, 


We can apply all of the rules of probability discussed in Section 4.4 
“to geometric probability. 


Exerci se 4, 7, 

1, Suppose you drop a penny on the tile^d floor of a 10' by 10’ room, the tiles 
being one-inch black and white squares, arranged in checkerboard pattern, with 
no mortar between tiles. What is the probability that the penny will 

(a) lie completely within some tile? 

(b) overlap parts of at least two tiles? 

(c) overlap parts of four tiles? 

2, If a needle two inches long is dropped on a floor and intersects a crack be¬ 
tween two floor boards, what is the probability the needle would be "cut” by the 
crack less than one eighth of an inch from its center? If 10 throws of the 
needle are taken in which the needle intersects a crack, what is the probability 
that the needle is never "cut” within its middle third? 

3, Suppose a bomb dropped from high altitude is equally likely to hit any part 
of a factory, if it hits it at all . What is the probability that if a bomb hits 
a factory covering 100,000 square feet of ground, it will hit the power plant 
which covers 2000 square feet of ground? If 10 bombs hit the factory, what is 
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the probability of at least one hit on the power plant? How many hits on the 
factory would be required in order to have a probability of 0,9 of getting at 
least one hit on the power plant? 

4, Suppose 52 barrage balloons are moored at altitudes of 5000 feet by cables, 
and that they are arranged in a straight line, the balloons being spaced 1000 feet 
apart. If a plane having a wing span of 200 feet flies through this barrage at 
night, what is the probability it hits a cable 

(a) if it flies through the plane of the cables 
at right angles? 

CbJ if it flies (.level flight) through the p-lane 

of the cables at 30^ (to the plane of cables)? 

What is the probability that three planes will fly through the barrage as in (a) 
without striking cables? As in (b) without striking cables? 

5 , A petty gambling joint at a county fair operates as follows: the top of a 
table is solidly covered with 200 Lucky Strike cigarette packages in 10 rows of 
20 packages each. Each package measures 2.25 by 2.75 inches and has a red circle 
1,5 inches in diameter, A player pitches a penny onto these cigarette packages. 

If the penny falls inside a red circle, he wins a package of cigarettes and gets 
his penny back. Otherwise he loses his penny. Assuming that no pennies slide 
off the table or overlap its edges and that a penny is equally likely to stop 
anywhere on the mat of cigarette packages and that Lucky Strikes are worth 16 
cents a package, is this a fair game? Who has the advantage and what is his ex¬ 
pectation of winnings per 100 throws? 


CHAPTER 5. PROBABILITY DISTRIBUTIONS. 


5,1 Discrete Probability Distributions . 

5.11 Probability tables and g;raphs . 

In probability problems we often find that what we are really interested 
in are the probabilities for a set of events, where the events can be simply des¬ 
cribed by nimibers. For example, in rolling a pair of dice, we are usually inter¬ 
ested in the various total numbers of dots it is possible to get and in their 
probabilities. In other words, we are interested in the probabilities of getting 
a total of 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 dots. These numbers can be con¬ 
sidered as possible values of the chance quantity X, where X denotes the total 
number of dots obtained in rolling two dice. For any one of these values which 
X can take on, we would have a probability. In general, if we call f(x) the 
probability that X = x (i.e,, of getting x dots), we can call f(x) the probabil¬ 
ity distribution of X, 

In case X can take on only certain isolated values on an interval 
rather than all values on an interval, we call X a discrete chance quantity . 

Thus, in the dice problem X can take on only the isolated values 2, 3, ..o, 12 
and not all values on the interval between 2 and 12, We use the word discrete 
in contrast to the word continuous in describing a chance quantity X in this 
section. A continuous chance quantity X (to be discussed in Section 5.2) is one 
which can take on any value in an interval. In probability problems it is 
usually unnecessary to use the adjectives discrete or continuous in referring 
to a chance quantityj the context of the problem makes it clear which case is 
under discussion. 

The values of x and f(x) for the two-dice problem can be displayed in 
a probability table like this; 


TABLE 5.1 

Probability Table for Total Number of Dots Obtained in Throwing Two Dice 


X 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

f(x) 

1 

2 

3 

4 

5 

6 

5 

4 

3 

2 

1 


36 

36 

36 

36 

36 

36 

36 

36 

36 

36 

36 
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This table gives us the probability of each number of dots which can appear 
when throwing two dice. We can represent these probabilities graphically by a 
probability bar chart like Figure 5.1. The value of the probability correspond¬ 
ing to a given value of x is represented by the length of a vertical bar erected 
over that value of x. Note that the sum of the probabilities is 1. 



Probability Bar Chart for Table 5.1 
Figure 5.1 

Alternatively, we can represent the probabilities by a probability histogram like 
that in Figure 5.2, where rectangles centered at the various possible values of 
X are erected instead of vertical lines. 



Probability HistQp;ram for Table 5.1 


F igure 5,2 


100 


5. PROBABILITY DISTRIBUTIONS 


Sec. 5.11 


We can also make a ciimulative probalDility table like this: 

TABLE 5.2 

Cumulative Probability Table for Table 5,1 


X 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

F(x) 

1 

36 

3 

36 

6 

36 

10 

36 

15 

36 

21 

36 

26 

36 

30 

36 

33 

36 

35 

36 

36 

36 


and represent it graphically by a cumulative probability graph as shomi in 
Figure 5,3 (where the coordinates of each large plotted dot are from the 
cumulative probability table) , You should particularly notice that f(x) 
represents probabilities and F(x) represents cumulative probabilities for a 
population, more or less as f^ represents frequencies and represents cumu¬ 
lative frequencies in a sample of grouped data (Section 2,4), 

If in Figure 5.3 we pick any value of xwe please, say x', the 
probability that X < x’ (i,e,, that the number of dots will be less than or 
equal to x’) is F(x0j or more briefly Pr(X <x') = F(x’)* 



Figure 5,3 
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10 

For example, if x’ = 5, then Pr(X < 5) = ^ or if x' = 9.3, then Pr(X < 9.3) 

30 

= , For any two values x’ and x’’ (x’ < x’’)^ we can also find the probabil¬ 

ity that X will exceed x' and be less than or equal to x*’, i.e., we can find 
Pr(x' <X <x’’) by taking the difference between F(x’’) and F(x’), i.e., we 
have Pr(x’ <X<x’’) = F(x’’) - F(x’). For example, if x’ =3 and x’’ = 6, we 

15 3 12 

have Pr(3 < X < 6) = F(6) - F(3) = — “■ ^ ^ , or to take another example, 

Pr(4.5 < X < 7.5) = F(7.5) - F(4.5) fl* “ = fl" respectively. 

These ideas, illustrEited for the case of two dice, extend to more gen¬ 
eral situations. In the general case, itc vrould have a chance quantity X which 
could take on the possible values x^, x^, x^ ¥dth probabilities f(x^), 

fCxg), ..., f(x^_^) respectively, (v/here f(x^) + and the 

probability table would be like this: 

TABLE 5.3 

Probability Table for a General Discrete Chance Quantity X 


X 

y 

^2 



f(x) 

f(x^) 

sU^) 

• • « 

f(x,) 


From this table one could construct general graphs similar to those in Figures 5. 
5.2, 5,3 for the two dice problem. The probability distribution f(x) is called 
a discrete probability distribution, although we usually omit the word discrete 
in any specific problem if it is clear from the problem that the chance quantity 
X involved is discrete. 


5.12 Remarks on the statistical interpretation of a discrete probability dis¬ 
tribution . 

Working with a discrete probability distribution is similar to working 
with a relative frequency distribution. Whatever manipulation (tables, graphs, 
etc.) can be done with one, can also be done with the other. In fact, we will 
regard a probability distribution as a relative frequency distribution for an 
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indefinitely larp;e sample in which, each measurement c an take on one of a dis- 
crete set of values , Since we regard an indefinitely large sample as an indef¬ 
initely large population^ we will therefore regard a discrete probability distri¬ 
bution as a relative frequency distribution for an indefinitely large population 
in which each measurement can take on one of a discrete set of values . This is a 
theoretical concept but a very useful one. Thus, for example, the probability 
distribution shown in Table 5.1 will be regarded as the relative frequency dis¬ 
tribution of the total number of dots obtained on a pair of dice in an indefin¬ 
itely large number of "perfect" throws of a pair of "perfect" dice - i,e., the 
population of "perfect" throws of a pair of "perfect" dice. 

We can calculate means , variances , standard deviations and other quan¬ 
tities from a probability distribution very much as in the case of a relative 
frequency distribution, 

5.15 Means, variances and standard deviations of discrete chance quantities . 

The mean of a discrebc chance quantity X having a probability distri¬ 
bution f(x), is given by the following formula ; 

(5.1) p = x^fU^) + + ..c + \^(\^ 

or 

k 

(5.1a) p = ^ X. f(x,) . 

Or more briefly we may write 

(5.lb) p = E(Xj . 

We usually shorten our statement and say "p is the mean of the probability 
distribution f(x)". 

It is customary to refer to p as the mean of the di st ribution f(,x) or 
more briefly, the mean of X and also the mathematical e xpectation of X. The ex- 

k 

presoion EiX), which is a shorthand expression for ^x.f(.x,), is read "expec- 

1 i=i ^ " 

tation of X", Note that E(X,,) and —SiX) are shorthand symbols playing similar 

n 

roles on probability distributions and sample frequency distributions respectively. 
Example : Find the mean p of the number of dots obtained in throwing two dice. 

We have, by applying formula (,5.1j to Table 5.1 
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* ^ ^ = 7 . 

or 

H « E(X) = 7 . 


The stat: stica,]. interpretation of this F.ean is simply this: if two 
"perfect*’ dice are throvm indefinitely many times, the average numbei* of dots 
per throw would be 7« 

. Note that if we rewrite the mean of a sa.m.ple of measurements on X in 
a grouped frequency distribution as given by the familiar formula in the form. 


( 5 . 2 ) 


k 

X ^ r. 


i=l 



n 


then the similarity of the formulas for p. and X shows up at or.ce. The relative 
f. _ 

frequency (~‘) pl^ys the sam.e rol e in the definition of X that the probabi 1 i ty 

f(x.) plays in the definition of p <, 

2 

The variance a of a discrete chance quaritity X having probability 

2 

distribution f(x) ( or more briefly, the variance o of the probability distribu- 
tion f(x) ) is d e fined as 


(5,3) - (x^- p)^-f(x^) + (x^ - p)^. f(x 2 ) + - p)^- f(x^) 


or 

(3,3a) ~ p)^-f(x,) o 

i=l ^ ^ 


4 m.ore convenient form; for a is obtained by writing (5o3a) as follows 

k k k k 

--21 - Zx + l)-f(x ) = ^ xl f(x ) - 2^i^x f(x ) + f(x ) . 

tl ’ ^ ^ ^ " i = l " ^ i=i " 


But 


hence we have 


k 

X 

i=l 


X. f (X. ) = 

1 1 ' 


k 

and ^ f (x. ) = 1 ; 
i--l 


- Z tx.) 

^ 1 1 


2 


(h.3b) 
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Exercise 5,1 . 

1, Th^ee coins are tossed. Let X be the chance quantity denoting the number of 
heads obtained. Write down the probability distribution of X in table form. Draw 
a probability bar chart and a cumulatiYe probability graph for the distribution. 
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Also, find the mean and Tariance of X. 

Five d\}iips are marked 1, 2, 3, 4, 5 respectively. Let X be the chance quan¬ 
tity denoting the sum of the numbers on two chips drawn at random. Write down 
the probability distribution of X in table form. Draw the probability bar chart 
and the cumulative probability graph of the distribution. Also find the mean 
and variance of X. 


3. Suppose there are 3 defective articles in a lot of 12. A sample of four 
articles is taken at random out of the lot. Let X be the chance quantity denot¬ 
ing the number of defective articles in the sample. Write down the expression 
for f(x), the probability distribution of X, Write down the probability distri¬ 
bution of X in table form. Construct the probability bar chart and cumulative 
probability graph. Find the mean and variance of X. 

4. Suppose a hand of 13 cards is dealt from a deck of 52 playing cards. Let X 
be the chance quantity denoting the number of aces obtained. Write down the 
expression for f(x), the probability distribution of X. Write down the proba¬ 
bility distribution in table form, and construct its probability bar chart and 
cumulative probability graph. Find the mean and variance of X, 


5. One cigarette from each of four brands. A, B, C, D is partially smoked by 
a blindfolded person. As soon as he has taken a few puffs on a cigarette, he 
states the letter of the brand to which he considers it to belong. (Of course 
he can use each letter only once.) Let X be the chance quantity denoting the 
number of cigarettes correctly identified. If the identification is done at 
random (i.e,, he is equally likely to assign any letter to any cigarette), write 
down the probability distribution of X in table form. Draw the probability bar 
chart and cumulative probability graph for the distribution. Find the mean and 
variance of X, 



6 , There is a box of 10 articles known to contain 2 defective articles, A 
person looks for the defectives by taking one article out at a time and testing 
it. What is the probability f(x) that the x-th article tested will be the last 
defective in the box? (in this problem the chance quantity X is the number of 
articles tested by the time the last defective is foundj the possible values of 
X are 2, 3, 4, 5, 6, 7, 8, 9, 10,) Construct the probability bar chart and the 
cumulative probability graph for this case. Also find the mean and variance of 


X. 
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5 .2 Continuous Probability Distributions , 

In Section 5.1 we discussed a type of probability distribution called 
9- discrete probability distribution , i.e., one for which there is a chance quan- 
tity X which can take on only certain isolated or discrete values . For example, 
if X denotes the number of heads appearing in tossing 4 coins, the only possible 
values X can take on are 0, 1, 2, 3, 4„ If X denotes the number of aces in a 
hand of bridge, the only possible values X can take on are 0, 1, 2, 3, 4. 

But t}iere are situations in which a chance quantity X can take on 
(ideally) any value in an interval. This kind of a chance quantity is called a 
continuous chance quantity and probability distributions associated with them 
are called continuous probability distributions . As mentioned in Section 5,1 we 
shall, when no ambiguity drises, omit the word continuous. Our problem here is 
to describe continuous probability distributions, 

5,21 A simple continuous probability distribution . 

We can fix the ideas by a simple example. Suppose we have a 360*^ 
circular scale one "unit” long, and that we have a balanced pointer pivoted at 
the center as shovm in Figure 5.4. 

0 



Suppose that when the pointer is whirled and allowed to stop, it is "equally 
likely” to stop anywhere. Then it follows from the Definition of Geometric 
Probability that the probabiliby of the pointer falling into any interval is 
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equal to the length of that interval llvided by the length of the whole scale. 
But we have chosen tlie length of the scale to be 1 anit . Thus, the proba¬ 
bility, for example, of the pointer falling between 0„2 and 0,6 is 0,4/1 = 0,4, 

If we let X be the distance rllong the scale measured from the zgto- 
point to the point at which the pointer stops, then X is a contin u ous chance 
quantity . It can take on any value between 0 and 1, Its value for any single 
whirl of the pointer depends on where the pointer stops. It is natural there¬ 
fore for us to try to set up a probability distribution which 'would give us any 
probability dependent on the chance quantity X in the pointer problem which we 
may wish to evaluate — just as Table 5,2 gives us any probability we may -wish 
fco know about the chance quantity X representing'the number of dobs appearing 
in a throw of two dice. The main question is this; How can we describe the 
probability behavior of the pointer in matheLiatical terms ? 

The direct answer is this; Just as In the case of a discrebe chance 
quantity (See Section 5,1 and Figure 5,5), we construct a cumulative distribu¬ 
tion function F(x) which shows the probability that the chance quantity X is 
less than or equal to any particular value x we may want to consider . 

For the pointer example, the graph of F(x) is shown .in Figure 5,5. 

It is to be emphasized that for any particular x, say x*, F(x’) is the proba¬ 
bility that X < x' (i,e., that the pointer will stop between 0 and xO^ or 
written more briefly 

(5,4) Pr(X < x') = F(x’)« 

The grapli of F(x) should be compared with that for the two dice prob¬ 
lem in Figure 5.3, Notice that the graph in that case rises by jumps, while it 
rises smoothly or continuously in the case of the pointer. This is what one 
would expect, for in the two dice problem, the probability is partitioned into 
eleven ^pieces” and concentrated at 2, 3, 4, 5, 6, 7, 8, 9, IvO, 11, 12 as shovm 
in Table 5,1, whereas in the pointer problem the probability is conbinuously 
spread over the interval between 0 and 1 in a uniform manner. 

As you will see from Figure 5,5, F(x) is zero for any x less than 0, 
since the chance quantity X in the pointer problem can never be less bhan 0; 
F(x) is 1 for any x greater than 1, since X is certain never to exceed 1, For 
any x between 0 and 1, we agreed earlier in this section (and it follows from 
the Definition of Geometrical Probability) that the probability of X < x is x. 
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or more briefly 

(5.5) Pr(X < x) = X . 



Cumulative Probability Graph When Pointer is 
Equally Likely to Stop Anywhere on the Scale 

Fip;ure 5.5 


But Pr(X<x) is what ive mean by F(x), Thus for any value of x between 0 and 1 
we have P(x} = x, as you can see from the graph in Figure 5,5. Summarizing the 
values of ?(x) for the pointer problem, we may say: 

F(x) * 0 when x < 0 , 

(5.6) F(x) = X when 0 < x < 1 , 

F(x) •= 1 when 1 < x , 

In the case of a continuous chance quantity Pr(X < x) = Pr(X < x), 
i.e., it does not matter whether vfe use < or < , The probability that X = x is 
0, i.e., the probability that a continuous chance quantity has any single value 
specified in advance is zero . 

To find probabilities of events more complicated than X < x in the 
pointer problem, we can apply.the rules of probability in Chapter 4, Suppose x' 
and x’’ are any two particular numbers (where x’ < x’’), and that we want to 
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find Pr(x’ < X < x'’). This probability is simply the probability of X < x’’ 
minus the probability of X <x’, i.e., 

Pr(x' <X<x'’) = Pr(X <x”) - Pr(X<xO . 

But from (5.4) we may write this as 

(5.7) Pr(x’ < X < x'’) - F(x'’) - ^(x’) . 

But in our problem F(x) = x . Therefore, 

(5„8) Pr(x’ <X<x’’) = x'’ - x’ . 

For example, Pr(0o2 < X < 0.5) 0,1 and Pr(0.75 < X < 0.37) = 0.12 , The pro¬ 

bability that X falls in either tlie interval (0.2, 0.3) £r the interval (0.75, 
0.87) is 0.22, since the two intervals have no point in common (the events 
0.2 < X < 0,3 and 0.75 < X < 0,87 are mutually’ exclusive). 

If the pointer is whirled twice, the probability that it will stop 
between 0.3 and 0,6 both times Is, by the multiplTeat Ion rule for probability, 

the square of the probability of its stopping in this interval in one whirl, 

2 

i.e,, (0.3)" = 0,09, And so on, for other special probability calculations, 

5.22 More general continuous probability distributions . 

Let us generalize the pointer problem a little. Suppose the pointer 
is the pointer of a one pound scale weighing "half-pound" packages as they come 
off a production line, V^e may consider this flow of packages as constibuting a 
population. They will not all weigh exactly the same. In practice, a few of 
them may weigh less than a half-pound, and most of them more than a half-pound 
(since the manufacturer must guarantee a half-pound of material in each package). 
The chance quantity X here is still the position on the scale (measured from the 
zero point) at which the pointer will stop when weighing a package. For any 
specified weight x’ there will be a fraction F(x') of the packages weighing less 
than x’ pounds. The graph of F(x) may look something like that shown in Fig¬ 
ure 5,6, 

In Figure 5.6 it will be seen that ?(x) is 0 when x < 0, and F(x) = 1 
when X > 1, just as in the simple case of uniformly distributed probability 
(Figure 5.5), But between 0 and 1, F(x) is some S-shaped curve (or ogive) in¬ 
stead of a straight line inclined at 45° to the X-axis, In such a problem as 


no 
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that of weighing packages, there would not exist, in general, a mathematical 
formula forF(x); the function F(x) would simply he given graphically by a 
smooth curve. In order to be able to plot F(x) we would actually need a large 
sample of measurements from which we could construct a frequency polygon -- and 
then "smooth” the polygon into a curve by drafting curves or by freehand draw¬ 
ing. From a carefully dra'wn graph we could read off the value of F(x) at any 
value of X to a practical degree of accuracy. The curve would then provide us 
with a way of obtaining "estimated" probabilities. For exa.aple, reading from 
the graph we make these "estimates" 

Pr(X <0.5) = F(0.5) = 0.07 
Pr(0.5 < X < 0.9) = F(0,9) - F(0.5) 

= 1.00 - 0.07 = 0.93 . 

And so on. The median of X is that value x for which F(x) 0,5, or median = 0. 



Cumulative Probability Graph for Pointer 
Wiien Tfeighing "half-pound" Packages 
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Exercise 5,2 . 

1, Consider the pointer problem as having the cumulative probability distribu¬ 
tion F(x) graphed in Figure 5.5, What is the probability that the first digit 
of X (after the decimal point) is even? That the second digit is < 5? That 
the pointer stops within a distance of 0,E of the middle of the interval? 

2, Suppose a pointer is equally likely to stop at any point on a scale between 
0 and 10, Graph the cumulative probability function F(x) for this problem, 

3, In problem No, 1, what is the probability that 0,3 < X < 0.6 or_ 0,4 < X 

< 0,8? If the pointer is spun five times, what is the probability that it will 
stop within the middle tenth of the interval at least once? 

4, If the pointer in problem No, 1 is spun three times, what is the probabil¬ 
ity the pointer will stop between 0 and x every time? If you call this proba¬ 
bility G(x), what formula does G(x) have for x < 0, for x > 1 and for 0 < x < 

1? Graph G(x) , What is the interpretation of G(x)? V?hat is the chance quan¬ 
tity X in this problem? 

5, Reading from the graph in Figure 5.6, the probability is about 0.9 that X 

< what value? That X > what value? Estimate the probabilities that X will 
fall in each of the following intervals: (0,0.l), (O.l, 0.2), (0.2, 0.3), 

(0.3, 0.4), (0.4, 0.5), (0.5, 0.6), (0,6, 0.7), (0.7, 0.8), (0.8, 0.9) and 
(0.9, l.O), and arrange them in table form. 

5 .3 Mathematical Manipulation of Continuous Probability Distributions . 

The simplest way to express the distribution of a continuous chance 
quantity is by means of a cumulative probability distribution. Tables of 
special distributions like the normal distribution to be discussed in Chapter 8 
are almost always tables of values of cumulative probability functions computed 
for closely spaced values of x. 

For easy mathematical handling, we often want to represent the proba¬ 
bility distribution of a continuous cjiance quantity in another way. In simple 
cases this will allow us to find m.eans and variances by integration (without 
numerical computation from a table or otherwise), 

5.31 Probability density functions - a simple case . 

In Section 5.2 we showed how the probability of X falling less than or 
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equal to a value x’ is represented by the ordinate at x’ (Figures 5.5 and 5.6); 
or the probability of X falling between two values, say x’ and x’', by the dif¬ 
ference between the ordinates at x* and x’Now can we find a curve in each 
case so that the probability of X falling between x' and x^’ is represented by 
the area under the curve between x' and x’' (and above the X-axis)? 

For the case of the uniformly distributed probability (Figure 5.5) 
the answer is easy. The curve in this case is a straight line parallel to the 
X-axis and one unit above it, as shov/n in Figure 5,7. 



Graph of Probability Density Function for Cumulative 
Probability Function graphed in Figure 5.5 

Fig ure 5,7 

If we take two values of x, say x’ and x'the area under tho graph of y = 1 
between x = x’ and x = x’’ is (x’' - x’)*l« But you will remember that this is 
the probability that X will fall between x' and x’’, and is equal to F(x'’) - 
F(x') in Figure 5.5. This is true for any two points x’ and x’’. Therefore, 
we have found a "curve" (a straight line here) such that the area between it and 
the X-axis which lies between any two values of x (x’ and x’’) has the same 
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numerical value as the iifference between the ordinates of a cumulative proba¬ 
bility graph at x’ and x’’. 

We shall call the function having the horizontal line in Figure 5.7 
as its graph the probability density function for this problem, and its graph 
the probability density graph . We can think of the graph in Figure 5.7 as the 
histogram of relative frequencies in an indefinitely large sample of spins of 
the pointer. As the sample becomes larger and larger, the histogram of rela.tive 
frequencies (based on any number of cells into which the range is divided) 
"becomes" the probability density graph. 

5.32 Probability density functions - a more general case . 

What will the probability density graph for the cumulative probabil¬ 
ity graph of Figure 5,6 look like? Since we are looking for a curve such that 
the area under it between x = x’ and x = x’’ is equal to F(x'’) - F(x’) in 
Figure 5,6, for any two values x’ and x’’, we can use the following table of 
figures for constructing the desired curve (read from the graph in Figure 5.6), 


TABLE 5.4 

Ordinate Differences from Figure 5.6 


1 

x’ ' 

Ordinate Differences: 

F(x'') - F(x’) 

(Area under desired curve between 

X’ and X’’) 

0 

0.1 

0 - 0 «= 0 

0.1 

0.2 

0 - 0 = 0 

0.2 

0.3 

0 - 0 = 0 

0,3 

0.4 

0 - 0 = 0 

0,4 

0,5 

.07 - 0 = .07 

0.5 

0;6 

.62 - .07 = .55 

0.6 

0.7 

.96 - .62 = .34 

0.7 

0.8 

1.00 - .96 = .04 

0.9 

1.0 1 

1.00 .. 1.00 = 0 

1.0 

1.1 

1.00 - 1.00 = 0 

1.1 

1.2 

1.00 - 1,00 = 0 

1.2 

1.3 

1.00 - 1.00 = 0 


Considering the 10 sets of values of x’ and x'' in Table 5,4 as boundaries of 
10 cells and the ten "areas" as relative frequencies, we can construct a histo¬ 
gram so that the total area under the histogram, is 1, If we imagine making 
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and 0,8 in Figure 5,6, 

We now have two ways of expressing the values of Pr(x’ < X < x' 
the probability of X falling between any two values x’ and x’’, First, we can 
express this probabilit y as the difference between the ordinates of the cumula¬ 
tive probability curve F(x) of Figure 5,6 at x’ and x’’, i,e., 

(5.9) Pr(xf <X <x”) = F(x'') - F(x’)j 

and secondly, we can express the probabili ty as the area under the gra.ph of the 
probability density function f(x) of Figure 5,8 betw'een x = x’ and x = x’^, i.e,. 


(5.10) 


Pr(x' < X < 


r t t 


The expression on the right of (5,10) is the definite integral of f(x)dx 
between x = x’ and x = x’’, and yields the desired area. Hence, from (5,9 
and (5,10), 


(5.11) 


X ' ’ 

F(x’’) - F(x0 = f(x)dx ; 


putting x' = 0, and dropping the on x'*, we have 

X 

(5.12) F(x) f(x)dx , 

0 

Y/hich gives us a way of finding the cumulative probability function F(x) from 
the probability density function f(x) by integration. Conversely, .we can find 
f(x) from F(x) by differentiation, i,e,. 


You should get the difference betvreen f(x) and F(x) firmly fixed in 
mind and then keep the distinction clear, 

Returring to the pointer problem in which the probability is uniform¬ 


ly distributea, we have 


f(x) - 1 
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F(x) =J' l*dx = X , 0 < X < 1 , 

0 

as we saw earlier and as was graphed in Figure 5,5. 

It should be noted that we are using the notation f(x) for probabil¬ 
ity distribution in both the discrete and continuous cases. There will be no 
confusion, however, because it will always be clear from the context of any 
problem or situation whether we are talking about the discrete case or the con¬ 
tinuous case (i,e,, whether the chance quantity X is discrete or continuous ). 

5,33 Continuous probability distributions - general case . 

We have discussed the setting up of a continuous probability distri¬ 
bution for the stopping point of a pointer on a continuous scale reading from 
0 to 1, The ideas introduced extend to more general situations involving con¬ 
tinuously distributed probabilities. More generally, we would have an interval 
(a, p) (where a could be any finite number or - oo , and p could be any finite 
number or + oo ) within which it is certain that a chance quantity X will fall. 
We would have a probability density function f(x) defined over the interval 
(g, p) which could be represented as some kind of a smooth curve above the 
X-axis. In the general case, the total area under the curve would represent 
the probability of X falling between a and p and hence would be 1, The proba¬ 
bility of X falling between any two values x' and x'' would be represented by 
a shaded area similar to that shown in Figure 5,8. We would have a smooth 
cumulative probability distribution F(x)j its graph would be similar to that 
shown in Figure 5.6, except that the curve would extend over the interval (a, p) 
instead of (0,l), F(x) would be determined from f(x) by formula (5.12) with 
0 replaced by a and f(x) would be determined from F(x) by formula (5,13), 

The mean and variance of a continuous probability distribution . 

As in the discrete case discussed in Section 5.1, we may calculate 
the mean p and variance of a continuous probability density function f(x) 
defined over the interval (a, p), For the mean p of the distribution f(x) we 
have 



a 


(5.14) 



Sec, 5,34 


5, PROBABILITY DISTRIBIJTlONS 


117 


which is analogous to the formula (5.1a) for the discrete case. We may also 
briefly write (just as was done in (5,1b)) 

(5.14a) = E(X) , 

where it is understood that E(x) is a shorthand expression for the integral in 
(5.14). 

For the variance of the distribution f(x) we have 


(5,15) 


=J' (x-^i)^ f(x)dx j 


a more convenient form is 


(5.15a) 


= J^ x^f(x) 


dx - , 


or more briefly 
(5.15b) 


? = E(xh - [E(X)]^. 


Formulas (5,15), (5.15a) and (5,15b) are analogous to formulas (5,3a), 
(5,3b) and (5.3c) in the discrete case. 

Returning to the example of the pointer, consider the pointer equally 
likely to stop at any point on the scale. Then f(x) = 1 as shown in Figure 5,7. 
To get the mean, we use (5.14) with a = 0, and p * 1, 


X 

p = E(X) =J' x-1' 


2 1 


dx = 


I - °" I' 


2 ■ 

To get the variance we use (5.15a) with a = 0, and P = 1. 

1 

1n2 


= J'x^l.dx - (f-) 
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3 ^2^ 12 ’ 


^ Remarks on the statistical interpretatlor of continuous probability 
distributions , 

As in the case of discrete probability distributions, we will use a 
continuous probability distribution as a population distribution ’’model”. 

More specifically, we will regard a continuous probability distribution as a 
relative frequency distribution for an indefinitely large sample (i.e., for a 
population) when the measurements can take on any value within a ^iven inter ¬ 
nal * The reasonableness of this ’’model” is clear if you consider taking larger 
and larger samples from a population (in which measurements can ’’ideally” take 
value in a certain interval), and make histograms of relative frequencies 
with smaller and smaller cell lengths. Under such conditions these histograms 
become increasingly more like a smooth curve such as that shown in Figure 5,8, 
and their cumulative polygons become increasingly more like a smooth curve such 
as that shown in Figure 5,7. (in general, we would have other numbers than 0 
and 1 for the end points of the interval.) Therefore, if we arbitrarily in¬ 
troduce a smooth curve to represent the distribution of relative frequencies 
for such a population, we have, at least in some cases, a fairly simple and 
fairly accurate model to use in calculating frequencies, means and other quan¬ 
tities dependent on the population distribution. 


Exercise 5.3 . 

1, Suppose X is a chance quantity with a continuous probability distribution 

1 

f(x) = Yq "the interval (0,10), Find the expression for the cumulative pro¬ 
bability distribution F(x) and graph it. Find the median (from F(x), not from 
its graph), Find the mean p and variance of the distribution, 

2. Suppose X is a chance quantity with a continuous probability distribution 
f(x) = 2x on the interval (O, 1). Find F(x) and graph it. Find p and a^. 

Find the median, the lower and upper quartiles of the distribution. Find the 
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value of < X < ■!■) j of Pr(X < ~) , 

3, A olianoe quantity X has the continuous probability distribution f(x)=6x(l-x) 

2 

on the interval (0, Ij. Find F(x) and graph it. Find p and a , 

4, Since G(x) in problem Ho, 4 of Exercise 5.2 is a cumulative probability 
distribution, find the probability density function from it. Find the mean and 
the variance of X, 

5, A point is taken ‘^at random” from the interval (O, l), all points being 
equally likely. A second point is then taken in the same way. Let X be the 
coordinate of the point half way between these points, X is a continuous 
chance quantity wi.th a probability density function having an inverted V graph 
as shown in the following figure: 



Find the values of Pr(X < .25), Pr(,l <X < .9), Pr(X > .8). Write down the for¬ 
mula for f(x). Find the mean and variance of X. Find the formula for F(x) and 
graph F(x). 
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6. Suppose a continuous cumulative distribution F(x) is defined a; 

F(x) ^ 0 ^ 1 

F(x) = 1 < X < 3 

F(x) =1 X > 3 ♦ 

Graph. F(x), Find the probability density function f(x) and graph i 
the mean and variance of X, Find Pr(2 <X < 5). Find the median. 


follows; 


t. Find 


CMPTER 6o THE BINOMIAL DISTRIBUTION. 


6,1 Derivation of the Binomial PisbrIbution . 

Let us return to the coin-tossing problem discussed in Section 4,4. 
There it was showri that if n ’’true” coins (ioCo, coins for which heads and 
tails are equally likely) are tossed once (or one ’’true” coin is tossed n 
times) then the probability f(x) of getting x heads is given by formula (4.6) 


1 .e„ 




Low suppose we have a biased coin fcrwhich the probabijity of - a head 
is p and the probability of a tail is q (“ 1 - p)» What is the probability of 
getting X heads in throwing such a coin n times? 

First consider throwing the coin twice. The four possible events are 

TT, TH, HT, HH A 


Considering the results of the two throws as being independent events, then it 
follows by Rule III (multiplication of probabilities) that the probabilities 
for these four events are 


qq, qp, pq, pp . 

The two middle events each result in one head (and one tail), and the proba¬ 
bility of getting one head is, by Rule II (addition of probabilities), the sum 
of the two probabilities, i.e., 2pq, Therefore, the probabilities of getting 
0, 1, 2 heads in throwing the biased coin twice are 

2 2 
q , 2pq, p 

respectively. 

In the case of tossing the coin three times, the possible events are 
TTT, TTH, THT, HTT, THH, HTH, HHT, HHH, 
and their probabilities are 

qqq, qqp, qpq, pqq. qpp» pqp. ppq. ppp • 
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The second, third and fourth events result in one head, the fifth, sixth and 

seventh result in two heads. Thus the probability of getting one head is qqp 

2 2 
■*' 'IP‘1 3pq ; similarly the probability of two heads is 3p q. Hence the 

probabilities of getting 0, 1, 2, 3 heads in throwing the bia.sed coin three 


times are 


3 „ 2 , 2 3 

q , 3pq , 3p q, p 


respectively. 

In genera], suppose we ask for the probability of x heads (and n - x 
tails) in n tosses of the coin. Any particular order in which x heads and n - x 
tails appear is simply an arrangement (permutation) of x H’s and n - x T’s, and 


the probability of this particular arrangement is p^q^”^. 


We know from Section 


4.3 formula (4,2)) that there are —p 


(i«e., C^) possible permutation's of 


X H’s and n-x T’s, Hence by Rule II (addition) the probability f(x) of getting 


X heads and n-x t 


pX^n X ... + x^^q"* ^(the number of tenris be- 


(n-x)i 


, that is ,, 


^n X n-x 
C p q 

X ^ ^ 


For example, suppose v/e imitate a biased coin by marking two of the 

faces on a die H and four of the faces T. In this case, the probability of a 

’’head” is •— and that of a ’’tail" is If this "coin” is "tossed” 4 times, the 

o o 

probability of getting 3 "heads" is 


^3 ^3^ 81 


Note that the expression for f(x) in (6,l) is simply the general term 


in the expansion of the binomial (q pi 


In fact, we have 


(6.2) (q+l 


^n n ^ _n n-1 

>) = q + C,pq 


,n 2 n-2 
'oP q 


„n X n-x 
C p q 


The first term on the right is the probability of 0 '’hea(js" in n "tosses", the 
second is the probability of 1 "head" in n "tosses", and so on. We may list 
these terms in the form of a probability distribution table as follows : 



TABLE 6.1 


Probability Table for General Binomial Distribution 


X 

f(3 

:) 

0 

n 

q 




n-1 

1 

A 

pq 



2 n-2 

2 

A 

p q 

• 


• 



• 

• 


» 



X n-x 

X 

C 

X 

p q 

• 


• 

• 


• 

• 

n 

• 

n 

P 



So .far^ we have talked about ’’heads" and "tails" in throwing "coins". 
We lose nothing by being a little more general and talking about event "E" and 
event "not E". We can then summarize our discussion in the following important 
result: 

If an event E has probability p of occurring on each of n independent 
trials, then the probability f(x) that it will occur exactly x times in n trials 
1s given by 


(6.3) 


f(x) 



X n-x 

p q 


’ Where q = 1 - p. 

The probability distribution (6.3) is naturally called the binomial 
probability distribution, or simply the binomial distribution . 

By putting P ~ into formula (6.3) we get the forniula for the proba¬ 
bility of obtaining x heads in tossing n unbiased coins which we have already 
discussed in Section 4.4 (see fonnula (4.6)). 
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For small values of the individual probabilities in any binomial 
distribution problem can be conveniently calculated by means of a recursion for¬ 
mula. ioCc, a formula relating the values of Kx) for two successive values of 
Xc For we have 

r,f \ ,h'i X n-x 
f(x) = p q 

f(x+i) = p q 

Taking ratios we have 

f(x+l) 

or 

(6.4) f(!c+l) = ^ ^ f(x) . 

To use this recursion formula on any given problem we first calculate the value 
of f(0), which is 

f(0) = q”"; 

then substituting x ~ 0 in (6,4) we get the value of f(l), i.e,. 


x-f-1 


x+1 n-x-1 

p q 


^n x n-x 
C p q 

X ^ ^ 



q x+1 q 


f(i) = f- f(o) . 

Similarly 

f(2) = f(l) , 

2 q 

and so nn for all terms. 

Example : If a die is rolled four times, what is the probability distribution of 

X, where x is the number of times the "six" occurs? The possible values of x are 
0, 1, 2, 3, 4. 

The probability of x "sixes" is 


f(x) 


(|)" (f)"-" 


f(x+l) 


4-x 

x-i-1 


f(x) 


The recursion formula is 
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and f(0) “ (”) - 0.482, For the other values we have 
5 

= 0,015 

= 0 , 001 . 

Arranged In table form we have; 


f(l) - i f(0) - 0.336 f(S) = 4- T- r(2) 

D 0 O 

f(2) = I •f - f(l) = 0.116 f(4) - f(3) 


TABLE 6,2 

Probabilities of Gettiiig 0^ 1^ 2, 5^ 4 Sixes 'in Rolling a Die Four Times . 


X 

(No, ’'sixes”) 

f(x) 

f(x) 

(to 3 decimal places) 

0 


0,482 

1 


0.386 

2 

6(|)hf)^ 

0.116 

3 


0.015 

4 

i(|)" 

0.001 


In the case of large values of n, we find that a law knowai as the 
normal or Gaussian probability distribution gives a good approximation to the 
binomial distribution probabilities. The normal distributica and Its applica¬ 
tions will be discussed in Chapter 8, 

It should be made clear that it is not necessary to calculate the 
probabilities in Table ^,2 by recursion. They could be calculated directly 
from the formula for f(x), But you will find that if n is very much larger 
than X, recursion computation simplifies the calculations, 

6,2 The Mean and Standard Deviation of the Binomial Distribution. 


In any given example where the values of n and p K.re given, we could 











THE BINOMIAL DISTRIBDTION 


Sec, 6,2 




Sec. 6e2 


6. THE BINOMIAL DISTRIBUTION 


IE 7 


These formulas are important and we shall make repeated use of them. 
The simplest derivations of them are based on theoretical sampling principles 
for sampling from an indefinitely large population; these will be discussed in 
Chapter 9. The direct derivation of the formulas is a little cumbersome, but 
straightforward, and will be given here for the benefit of those who "want to 
see how they are directly derived. 

The mean p is given by applying formula (5.1) to the binomial distri¬ 
bution (6,3). This gives 


(6.8) p-l . 

x=0 ^ 

Similarly, applying formula (5.3b) to the binomial distribution (6.3), we have 

(c a\ ^ 2 r^n X n-Xi 2 

(6.9) <7=2-^LC pq J-p. 

x=0 ^ 

Now we must find a simple expression for each of the two sums involved. In order 
to do this, we introduce a device that may appear a little puzzling at first, but 
provides an easy way of evaluating these sums. Let us write down the following 
binomial expression and its expansion 

(8.10) (q + tp)“ = C“ (tp)° + c" (tp)^ q”'^ + ... 

+ c” (tp)’' + ... + c” (tp)" q° , 

where t is a letter just arbitrarily inserted. If we differentiate both sides 
with respect to t, we have 

(6.11) np(q + tp) =» !• [C^ p. q ] t + 2- [C^ p q ] t + ... 

r_n X n-x. . x-1 ^ r^n n 0-, .n-1 

+ X » iC p q ] t + ... + n*iC p q J t 

X n 

But if we put t = 1, and remember that q + p = 1, we find that the left side of 

(6.11) reduces to np, and the right-hand side is simply the sum indicated on the 
right-hand side of (6.8) all written out. Hence, the mean of the binomial 
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distribution (6.3) is 


( 6 . 12 ) 


= np , 


as stated in formula (6,5). Now to find a simple expression for <7 ^ let 
return to (6,ll) and multiply both sides of the equation by t. We get 


(6.13) np t(q + tp)^ ^ = 1 *[0^ q^ ^Jt + 2»[C, 


jr + ... 


+ ^ P q jt + ... + n« [C^ p q ] t . 


Now differentiate both sides with respect to t. We get 

(6,14) np(q + tp) ^ ^ + n(n-l) p^ t (q + tp)”“^ = 1^* [C^ p^ q^’'^]t'^ 

, ^2 r-n 2 n-2n 1 ^ 2 r n x n-x,,x-l 

2 • iC^ p q ]t + ... + X • [C^ p q jt 

... + n * [C^ P q Jt 

Putting t = 1, and remembering that q +• p = 1, we see that the right-hand side 

of (6,14) becomes the written out form of the vsum indicated on the right-hand 

side of (6,9), The left-hand side of (6.14) reduces to np + n(n-l) p^ when 

t = 1, Hence, substituting this quantity in (6,9) and also the value of p(i,e, 

2 

np), we find the variance a to be 

2 2 2 2 2 
a - np + n(n-l)p - n • p '. = np-np = npq , 

Therefore, the variance of X in the binomial distribution (6,3) is 


(6.16) 


a, = npq 


as stated in formula (6.6), and the standard deviation is 


= \/n^ 


6,3 “Fitting**a Binomial Distribution to a Sample Frequency Distribution . 

In the “true” coin and “true” dice problems mentioned earlier, the 
nimerical value of p. was agreed to in advance, and theoretical binomial 
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distributions of probabilities of numbers of heads (or numbers of "sixes’^) in 
n trials were obtained. But there are problems in which one cannot arrive at 
such an agreement or determine in advance what value p has. In such cases p 
will usually be estimated experimentally in accordance with Definition II in 
Section 4,1, To take a simple example, what is' the probability p that an ordin¬ 
ary celluloid-headed thumb tack, when thrown on the floor, will fall "point up"^ 
One cannot tell by examining the tack. But he can estimate it by throwing it a 
large number of times or throwing several tacks like it a large number of times. 
For instance, in 100 throws of 5 tacks the observed frequency distribution of 
values of x (number of tacks falling point up) in Table 6,3 was actually ob¬ 
tained, 

TABLE 6,3 


Frequency Distribution of Number of Tacks Falling 
Point up in Throwing 5 Tacks lOD Times 


X 

0 

1 

2 

3 

4 

5 

f 

2 

14 

■. 1 

20 

34 

22 

8 


Now the question is; How do -we estimate p, the probability of a tack falling 
"point up"? If one were to set up a probability distribution for x (the number 
of tacks in 5 falling "point up"), using the unknoTm probability p, he would 
obtain 

_/ N ' 5 X 5-x 

f(x) = p q 

for X = 0, 1, 2, 3, 4, 5, 

To estimate p, we set the mean of the (theoretical) probability distri¬ 
bution equal to the mean of the Xobserved) frequency distribution . Since n = 5, 
the mean of the theoretical distribution is (by formula (6,5)) 


p = 5p , 

and the mean of the observed distribution is 

Y _ ^ - 284 

At, 100 " 100 
1=0 


= 2.84 
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Hence, equating the values of fx and X, from 

5p = 2.84 

we get the following estimate of p based on the experimental results in 
Table 6.3: 


p = .568 . 

It should be noted that if we consider our data as coming from the re¬ 
sults of throwing one tack 500 times, we would have 284 trials in which the tack 

284 

fell point up, thus giving us p = = .568 directly by probability Definition II, 

It is to be emphasized that ,568 is only an estimate of p based on one experiment. 
If another experiment were made, a value would be obtained which would probably 
be slightly different but near ,56 or .57, 

Using this value of p in p q “ for x = 0, 1, 2, 3, 4, 5, we would 
have the "fitted" binomial distribution 

(.568)^ (,432)^”^ , 

which can be expected to "approximate" the observed relative frequency distribu¬ 
tion of the number of tacks falling point up. The results are sho'wn in Table 6.4, 
The entries in the column headed "expected" frequency are obtained by simply 
multiplying the "fitted" probabilities by 100 (the value of n), 

It will be noticed that there is a fairly close agreement between re¬ 
lative frequencies and "fitted" binomial probabilities and similarly between 
observed frequencies and "expected" frequencies. The cum.ulative observed fre¬ 
quencies and cumulative "expected" frequencies are also in good agreement, as 
you will see. 
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TABLE 6.4 

Fitting!; of a Binomial Distribution 
to the Data in Table 6.3 


X. 

1 

Observed 

Frequency 

Relative 

Frequency 

"Fitted" Binomial 
Probability Dis¬ 
tribution 

"Expected" 

Frequency 

Cumulative 

Observed 

Frequency 

Cumulative 

"Expected" 

Frequency 

0 

2 

,02 

C®(.568)°(.432)^=.015 

1.5 

2 

1.5 

1 

14 

.14 

C®(,568)’-(.432)'^=,099 

9.9 

16 

11.4 

2 

20 

.20 

C2(.668)‘'(.432)®=.260 

26.0 

36 

37.4 

3 

34 

.34 

Cj(.668)'^.432)^ = .342 

34.2 

70 

71.6 

4 

22 

.22 

C®(.568)^(i432)h.225 

22.5 

92 

94.1 

5 

8 

.08 

Cp(.568)‘’(.4S2)°=.059 

5.9 

100 

100 

■ 

100 

1.00 

1.000 

100 


_ 


1. If ten "true” dice are tossed once, what is the probability that x aces 
' will turn up? What is the probability that at’least two aces will turn up? 
That not more than one ace will turn up? 


2. Assume that on the average one telephone number out of five called between 
four and five p. m. on weekdays in a certain city is busy. What is the proba- 
bility that if ten randomly selected telephone numbers are called not more 
than two of them will be busy? 

3. Consider all families of five children, in’which there are no twins. As¬ 
suming the probabilities of a child being a boy or a girl to be equal, what 
fraction of the families would you estimate to have at least one son and at 
least one daughter? What fraction would have five sons or five daughters? 

4. If a hand of 13 cards is dealt from a deck of 52 bridge cards, the proba¬ 
bilities are approximately .30, .44, and .26 of getting no aces, getting one 
ace and getting more than one ace respectively. What is the probability that 

V 
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a person will play foiir hands of bridge and never receive an ace? That he will 
play four hands and never fail to get at least one ace? That he will play four 
hands and get at least two aces every time? 

5, In problem No. 4, let X be the number of hands in which a player gets no 
aces in playing four hands. Write do'ivn the expression for f(x), the probabil¬ 
ity that X = X. Write down the probability distribution of X in table form. 
Find the mean and variance of X. 

6, In inspecting 1450 welded joints produced by a certain type of welding 
machine, 148 defective joints were found. In welding five joints, what is the 
probability of getting 0, 1, 2, 3, 4, 5 defective joints? 

7, Two coins are tossed together five times. Let X be a chance quantity de¬ 
noting the number of pairs of heads obtained. Write down the expression for 
f(x), the probability that X = x. Write down the probability distribution of 
X in table form. Find the mean and variance of X. 

8, In throwing a single die, suppose an event E is defined as the appearance 
of a "five or six". Three dice were thrown 50 times and yielded the following 


distribution of X (the number 

■ of 

E’s 

per 

throw) ; 


X 

0 

1 

2 

3 


f 

18 

20 

11 

1 


(a) Estimate from the data the probability of getting an E in a 
single throw of one die, 

(b) "Fit” a binomial distribution to the observed distribution, 
and calculate "fitted" binomial probabilities and "expected" 
frequencies. 

(c) If the dice are true, the probability of an E is what? Use 
this value of p in the binomial distribution and "fit" this 
binomial distribution to the observed distribution. 



CHAPTER 7« THE POISSON DISTHIBUTION 


7.1 The Poisson Distribution as a Limiting Case of the Binomial Distribution 

In using the binomial distribution (6,1) one frequently encounters 

situations in which p is very small (less than ,l) and n large (greater than 

50) so that the mean np is some "moderate” number (between 0 and 10, say), In 

such cases there is an approximation for formula (6,1) which is simpler to deal 

with than (6,1) itself. If we put np =s m, then the approximate distribution 

is " % I 

X -m . . ‘ : r\ 

(7..1) f(x) = 2^^ , 

called the Poisson probability distribution, or more briefly, the Poisson dis- 
tribution, where the possible values of x are 0, 1, 2, 3, 4, (to infinity), 

and where e = 2.71828 the base of natural logarithms, (Actually, e is the 

limiting value of (l + ~)'^ as k is allowed to become indefinitely large and can 
be shown to be given by the formula 

i: 2'. 3! . 41 ••• • 

The value of e can be computed to any desired degree of accuracy by taking the 
sum of this series to a sufficiently large number of terms.) 

7 .2 Derivation of the Poisson Distribution , 

The argument which gives (7,l) as the approximate distribution runs 

-u^ __ njn-l) . . .(n-x^-1) __ 


as follows. Remembering that C^ may be written as —^-- 

write the binomial distribution (6.1) as follows j 

(7.2) f(x) . . 

Putting p, = “ , we may rewrite the right side of (7.2) as 


(S^) (Szi) ... _ 

n n n xl n n 


n-x'^l\ /m ) m\n m^-x 


Now let us hold x and m fixed, let n become indefinitely large, and see what 


133 



134 


7. 


Sec. 7,2 


____ THE POISSON DISTRIBUTION ___ 

hapeens to each term in this expression. Each of the terms , ,,, 

n n 

value 1 as its limiting value. The term can 


be -written as 




If, for the moment, we ivrite ~ as k, this expression can be written as 


[(i4)t“ . 


If n increases indefinitely, so does k. Hence, the expression inside the square 

-1 

brackets has e as its limiting value. Making use of the fact that the limiting 
value of a product of terms (such as those considered here) is equal to the pro¬ 
duct of the limiting values of the separate terms, we find that the limiting value 
of (7.2) is 


1.1 ... i.£_ (e-V-l 

xi 


X -m 
m e 


which is the Poisson distribution (7.1). 

It should be noted that if n is allowed to approach infinity so that 
np = m is fixed, then p must approach zero. Hence, the fact that the binomial 
distribution has the Poisson distribution (7,1) as a limit -when n approaches 
infinity and p approaches zero (so that np is a constant, m) means that the Pois¬ 
son distribution is an approximation to the binomial distribution for ’’large” 
n and ’’small” p. 

To see that the sum of the probabilities in the Poisson distribution 


,s unity, we write 


0 12 3 

m -m ^ m -m m -m m -m 

0’ ® ■' II ® 21 ® 31 ® 


2 3 

m ^ m___ 

2 : 31 




But it can be shown that 


. . ^ m^ m^ 

1 +• m + — + — 
2' 31 


Hence the sum of the probabilities is equal to e ^(e^) = 
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7.3 The Viea.n and Variance of a Poisson Distribution , 

The Poisson distribution (7,l) has the following simple property: 

The mean and variance are each equal to m . 

This may be seen if we proceed as we did in the case of the binomial 
distributioxi. We first consider the expansion 


(7o6) 


tm 


= 1 


tm + 


^2 2 
t m 

21 


,3 3 
t m 


s 


then multiply by e and get 
(7.7) 


-m tm -m 
e e = e 


^2 2 

I "•TTi , *t m *"iii 

tme + ® 


4-3 3 

t m -m . 
—— e + 


Now differentiating with respect to t we find 

2 2 

(ry r,-\ -m tm -m , 2tm -m . 3t 3 -ra . 

W, 8 ) me e = me + — e + —me + 

^ o O « 


By putting t = 1, and using summation notation, we see that the right-hand side 
of (7.8) is 

OD X 

m -m 

TT® * 

which is, by definition, the mean of the Poisson disbribution (7.1). But its 
value must be equal to the left-hand side of (7,8) when t = 1, which is m. 
Hence, the mean of the Poisson distribution (7.1) is m, i.e,. 



p = m 


To find the variance, we first multiply both sides of (7.8) by t, getting 
(7.9) 


2 2 3 

, -m tm , -m . 2t m -m , 3t 3 -m 

mte e = tme + —— e + "TT ^ ® 


21 


Then differentiating with respect to t, 

2 2 3 

(7.10) (m + m t)e e = me + 2^- - —— e + 3*-—^ 7 — e + ... . 

Now putting t = 1, the expression on the right-hand side of (7.10) is 


136 


7. THE POISSON DISTRIBUTION 


Sec. 7.3 



which has a value equal to the left-hand side (after putting t = l), namely, 

2 

m + m , But we know from the general formula for the variance (5o3b) that 

2 

the variance of the Poisson distribution is (7.11) (which has the value m + m ) 
minus the square of the mean, i.e,. 


= (m m^) - (m)^ = m 


thus establishing the fact that the variance of a Poisson distribution is equal 
to m. 

As a simple example of the use of the Poisson distribution in approximat¬ 
ing a binomial distribution for small values of p and large values of n, consider 
the following 

Example : Five coins are tossed 64 times, * What are the probabilities of getting 
five heads 0, 1, 2, 3, 4, 5, o.., 64 times? 

The exact values of these probabilities are given by the binomial 
probability distribution (6,l) for p = ~ (probability of five heads in throw¬ 
ing 5 coins) and n = 64 (the number of throws of five coins), i.e.. 


(7.12) 


f(x) = cf 


for X = 0, 1, 2, 3, .,,, 64, 

Approximate values of these probabilities are given by the Poisson 
distribution, with ra = np = 64 (■^) = 2, i.e.. 


(7.13) 


for X - 0, 1, 2, 3, ... (to infinity). 

The fact that x goes from 0 to 64 in the binomial case and from 0 to 
00 in the Poisson case will probably seem puzzling. But the point to be empha¬ 
sized is that the Poisson distribution is only an approximation (involving a 
simple formula) to the binomial distribution; the probabilities are all nearly 
zero anyway beyond x = 10 in this example. The accuracy of the Poisson approx¬ 
imation and the rapidity with which the probabilities become small as x becomes 
large are sho’/m in Table 7,1. 
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TABLE 7.1 

Exact Probabilities (from Binomial Distribution ) 
and Approximate obabi 1 itie^ scm^iVt^bution 

for Obtaining 5 Heads 0, 1, 2, etc, ^ Times in Throwing; 
5 Coins 64 Times, 


X 

Exact Probability 
(from Binomial Dis¬ 
tribution (7,12)) 

Approximate Probability 
(from Poisson Distribution 
(7.13)) 

0 

.131 

.135 

1 

.271 

.271 

2 

.275 

.271 

3 

.183 

.180 

‘4 

.090 

.090 

5 

.035 

.036 

6 

.011 

.012 

7 

.003 

.004 

8 

.001 

.001 

9 

. 000 

.000 

% 


7,4 ^Fitting" a Poisson Distribution to a Sample Frequency Distribution , 

In the example just given, we had a situation in which we are given 
the values of n and p and hence the product np (=Tn) in advance. But there are 
problems in which we do not have enough information to know m in advance, in 
which cases it has to be ”estimated” experimentally. In such cases, we obtain 
a frequency distribution of values of X from a sample of "measurements” and 
"fit” a Poisson probability distribution to the relative frequency distribu¬ 
tion, by equating the mean of the sample distribution to the mean m of the 
Poisson distribution. 

As an illustration we shall consider an experiment conducted by 
Rutherford and Geiger. They counted the number of alpha-particles emitted 
from a disc in 2608 periods of time, each period of 7,5 seconds duration. The 
frequencies f of periods in which x particles (x = 0, 1, 2, 3, ..,) were coun¬ 
ted are shown in col’jmn (b) of Table 7,2. 

The mean of the distribution of the observed frequencies is 

14 
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which is the average nurnber of aIphar-particles per 7,5-second interval. The 
mean of the Poisson distribution is m. Hence, equating m to the observed mean, 
we have 

m * 3,870 , 


The ’’fitted” Poisson distribution is, therefore, obtained by putting m 
in formula (7.1), thus giving us 


(7.14) 


f(x) 


(3.87)^ 

xl 


3,87 


for X = 0, 1, 2, 3, as the ’’fitted" distribution. Substituting successive¬ 

ly x « 0, 1, 2, 3, 4, ,..,14 (probabilities for values of x beyond 14 are too 
small to be significant in this problem) into formula (7,14), we obtain the 
’’fitted” values in column (d) of Table 7,2, 


Fitting of a Poisson Distribution to the Rutherford-Oeiger 
Data on Number of a-particles Emitted per 7,5-3eoond Interval 


(a) 

*1 

(b) 

Observed Frequency 
(no, a-partioles 
per 7,6-seoond 
interval) 

(o) 

Relative 

Frequency 

(d) 

"Fitted" Poisson 
Probability Dis¬ 
tribution 

(e) 

"Expeoted"Fraquenoy 
(no, a-partiales 
per 7,5-seoond 
interval) 

(f) 

Cumulative 

Observed 

Frequency 

(s) 

Cumulati'v 

"Expectec 

Frequenc 

0 

57 

.0219 

.0209 

54.4 



1 

203 

.0778 

.0807 

210.5 



2 

383' 

,1469 

.1562 

407.4 

643 

672.3 

3 

626 

.2013 

,2015 

525.5 

1168 

1197.8 

4 

632, 

.2040 

.1949 

608.4 

1700 

1706.2 

5 

408 ! 

.1564 

.1509 

393.5 

2108 

2099.7 

6 

273 

,1047 

.0973 

253.8 

2361 

2353,5 

B 

139 

.0533 

.0538 

140,3 

2520 

2493,8 

B 

45 

.0173 

.0260 

67.9 

2565 

2561,7 


27 

.0104 ! 

.0112 

29.2 

2592 

2590.9 


10 

,0038 1 

. 0043 

11.3 

2602 

2602.2 

B 

4 

.0015 

’. 0015 

4.0 

2606 

2606.2 

12 

1 

2 

,0008 

,0005 

1.3 

2608 

2607.5 

13 

0 

.0000 

.0001 

.4 

2608 

2607.9 

14 

0 

,0000 

.0000 

.1 

2608 

2608.0 

Total 

2608 

1.0000 

1.0000 

2608.0 
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The closeness of the fit of the Poisson distribution to the (observed) 
relative frequencies is clear from a comparison of the figures in column (d) with 
those in column (c), or from a comparison of those in column (e) with those in 
column (b), or from a comparison of those in column (g) with those in column (f). 

A question which one would normally ask is this: Since m = np, what 
are n and p in this alpha-particle counting problem? The answer is this: Think 
of the number of atoms in the radio-active material as being n and the probabil¬ 
ity of an atom emitting an alpha particle in a 7,5-second interval as being p. 
Hence, for a single 7,5-second interval we can think of there being a very large 
number n of trials for each of which there is a very small probability p of an 
alpha-particle being emitted. Hence, the mean number of occurrences of alpha- 
particles in a 7,5-second interval is np. But an experimental value of np has 
been determined, namely 3,87, i,e,, np * 3,87, Thus 

p » 3.87/n . 

The probability that x of the atoms will emit alpha particles in a specified 
7,5-seoond interval is given by the binomial distribution 

(7.15) C" (MI)* (1 . MI)"-* , 

X n n 

Since n, the number of atoms, is very large, it follows (by the same argument 
used earlier in this section to establish (7,l) as the limiting value of the 
binomial distribution (6,1)) that the limiting value of (7.15) is the Poisson 
distribution (7,14), 

It should be noted that in fitting the Poisson distribution to the 
Rutherford-Geiger data, it is not necessary to know the value of n or p indi¬ 
vidually; it is sufficient to know only np (» m), and this is determined 
experimentally , 

This is true in many situations, i.e., it is sufficient to know only 
np (= m) and not n and p sepapately. For instance, suppose that on the average 
3 nails out of 100 manufactured by an automatic nail machine are defective and 
that the nails are packaged in boxes of 200, Then the average number of defec¬ 
tive nails per box of 200 is 6 (i.e,, m = np - 6). Hence, we can say that the 
probability of a box containing x defective nails is 

X -6 

6 e 
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for X - 0, 1, 2, 3, ,,. , In -writing do-wn this distribution note that it was 

not necessary to use n and p individually; we used only the product np. Actually 
2 

n = 200 ajQd p = . To take another example, suppose a wool loom leaves an 

average of 1 defect of a certain type per 100 square yards of woolen cloth. What 
is the probability that a piece of woolen cloth consisting of 10 square yards 
has X defects in it? We must first find the mean number of defects per 10 square 
yards. This is clearly ,1, i,e,, m = .1. Hence the required probability is 

xl 

In this case we may think of dividing the material into a large number n of small 
areas, and think of p as the probability of a defect in a given small area. Evi¬ 
dently n would be large and p small. But the important thing is that the mean 
number of defects per unit area be kno-wn. It is 1 defect per 100 square yards, 
or ,1 defect per 10 square yards. 

The Poisson distribution occurs in many situations involving events 
occurring in time intervals of fixed length, space of fixed volume, areas of 
fixed size, line segments of fixed length, etc. Such examples are as follows: 

(a) Distribution of numbers of telephone calls received at a given 
switchboard per minute (for a large number of minutes) for a 
given part of the day. 

(b) Distribution of numbers of automobiles passing a given point 
on a highway per minute (for a large number of minutes) at a 
given time of the day, 

(o) Distribution of numbers of bacterial colonies in a given cul¬ 
ture per ,01 square millimeter (for a large number of units of 
,01 sq. ram.) on a microscope slide, 

(d) Distribution of numbers of deaths per day (for a large number 
of days) by heart attack in a large city, 

(e) Distribution of numbers of typographical errors per page (for a 
large number of pages) in typed material, 

(f) Distribution of numbers of fragments per square foot (for many 
square foot units) received by a test surface exposed to a 
fragmentation bomb at a given distance from the detonated bomb, 

(g) Distribution of numbers of times one received four aces per 75 
hands of bridge (for a large number of sets of 75 hands). 
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(h) Distribution of number of defective screws per box of 100 screws 
(for a large number of boxes). 

Exercise 7 , 

The following table of values of e ^ is given for convenience in working the problems 
in this Exercise: 


TABLE 7,5 

Table of Values of e 


m 

-m 

e 

m 

-m 

e 

,1 

.9048 

1 

.3679 

.2 

.8187 

2 

'.1353 

.3 

.7408 

3 

.0498 

,4 

.6703 

4 

.0183 

,5 

,6065 

5 

,0067 

,6 

,5488 

6 

.0025 

,7 

.4966 

7 

,0009 

,8 

.4493 

8 

,0003 

.9 

.4066 

9 

,0001 


1. Three dice are rolled 216 times. Using the Poisson distribution as an 
approximation to the binomial distribution, write doivn the approximate pro¬ 
bability of getting 3 aces x times. Work out the probability distribution of 
X to two decimal places. What is the chance variable X here? 

2. Using the Poisson distribution, what is the probability that if a person 
plays 76 hands of bridge he will get four aces x times? Make a probability 
distribution of X in table form. What is the chanco variable X? 

3. Suppose a certain automatic screw machine produces one slotless screw on 
the average out of every 100 screws. The screws are packaged in boxes of 100, 
Using the Poisson distribution as an approximation to the binomial distribu¬ 
tion, what is the probability that a specified box ivlll have x slotless screws? 
What is the chance quantity X? Using the Poisson distribution, write dovfn the 
probability distribution of X (to three decimal places). What fraction of 
boxos (of 100 screws) would you estimate to have no slotless screws? Not more 
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than 2 slotless screws? 

4, A direct mail advertising firm finds that on the average one person out of 
100 persons in small middle-western towns will send in orders for a certain 
article from a mail advertisement. If the firm sends 50 letters to persons in 
each of 200 such towns, from what percentages of the towns can the firm expect 
to receive 0, 1, 2, 3, 4, 5, ... orders? 

5, Suppose there is an average of one typographical error per ten pages of page 
proof of a certain book. What percentages of pages would you estimate to have 
0, 1, 2, 3, 4, errors? What is the probability that a 20-page chapter will 
contain no errors? 

6, Suppose that in normal summer driving in New Jersey a driver has an average 
of one puncture per 2,000 miles. What is the probability that the driver will 
have X punctures in making a ,1,000 mile trip? Write down the probability dis¬ 
tribution of X in table form, 

7, The probability that a man aged 35 will die before reaching the age of 40 
is .016. What is the probability that x of the 50 alumni, 35 years old, of the 
class of 1935 of college Z will die within five years? Write down the probabil¬ 
ity distribution of X in table form. 

8, The following table shows the distribution of numbers of vacancies occuring 
per year in the U, S, Supreme Court by years from 1837 to 1932 (data compiled 
by Wallis) ; 


No. vacancies 
per year 

Frequency 

0 

59 

1 

27 

2 

9 

3 

1 


Fit a Poisson distribution to this observed distribution, 

9, The following table shows the distribution of number of articles turned in 
per day to the lost and found bureau of a large office building for a period of 
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423 days, excluding suinmer months, Sundays and holidays, (data compiled by 
Thorndilce): 


No. of articles 
per day 

Frequency 

0 

169 

1 

134 

2 

74 

3 

32 

4 

11 

5 

2 

6 

0 

7 

1 

1 __ 


Fit a Poisson distribution to these data. 

10. The following table shows the distribution of number of deaths of soldiers 
in individual Prussian cavalry corps due to kicks from horses in 200 corps- 
years (data compiled by Bortkiewioz)i 


No. deaths 
per corps-year 

Frequency 

0 

109 

1 

66 

2 

22 

3 

3 

4 

1 


Fit a Poisson distribution to these data 





CHAPTER 8c THE NORMAL DISTRIBUTION 


8.1 General Properties of the Normal Distribution . 

The most important continuous probability distribution is the normal 
or Gaussian distribution which has been referred to in se-veral previous sections, 
particularly in Section 3.2. The cumulative distribution function Fjj(x) for a 
normal distribution having mean p. and standard deviation a is given by the formula 


’A 


1 / \ 2 
(x-p) 

e 2a • 


We usually refer to 8-s given by (8.1) as the cumulative normal distribution 

with mean p and standard deviation a , If X is a (continuous) chance quantity 
having a normal distribution with mean p and standard deviation a, then for any 
specified value of x, say x’, F^(x’) is simply the probability that X < x’ or 
more briefly 

(8.2) Pr(X < x')= Fjj(x>) . 

Since the normal distribution is the distribution of a continuous chance quantity, 
we can replace < by < without affecting (8.2), If x' and x'’ (x' < x'0 aiiy 
two specified values of x, then (formula (5,9)) 

(8.3) Pr(x’ < X < X’ •) * F^(x» •) - F^^Cx*) . 

The probability density function fjjj(x) which is obtained by differen¬ 
tiating F^(x) with respect to x (see formula (5,13)), is 


f„(x) = 


•1 . ^2 

-5(x-p) , 


For X = - 00 , Fj^(x) = 0, and for x = + co , Fjj(x) « + 1, But for all 
practical purposes, the value of F^(x) ranges from ^nearly” 0 to "nearly" 1 as 
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TABLE 8.1 


Values of the Cumulative Normal Distribution 


z 

X 

F^(x) 

z 

X 

Fj,(x) 

o 

CO 

yi - 3.0a 

.0013 

.1 


.la 

.5398 

-2.9 

ja - 2.9a 

.0019 

.2 


. 2a 

.5793 

-2.8 

ji - 2.8a 

.0026 

.3 

P + 

.3a 

,6179 

-2.7 

y. - 2.7a 

.0035 

.4 


.4a 

.6554 

-2.6 

^ - 2.6a 

.0047 

.5 


.5a 

.6915 

-2.5 

y - 2.5a 

.0062 

.6 

+ 

.6a 

.7258 

-2.4 

y - 2.4a 

.0082 

.7 

P + 

.7a 

.7580 

-2.3 

y - 2,3a 

.0107 

.8 

P 

.8a 

.7881 

-2.2 

jji - 2,2a 

.0139 

.9 

P + 

.9a 

.8159 

-2.1 

y - 2.1a 

.0179 

1.0 

P + 

1.0a 

.8413 

-2.0 

(i - 2,0a 

.0227 

1,1 

P + 

i.la 

.8643 

-1.9 

y - 1.9a 

.0287 

1.2 

P + 

1.2a 

.8849 

-1.8 

ja - 1.8a 

.0359 

1.3 

P + 

1.3a 

.9032 

-1.7 

ji - 1.7a 

.0446 

1.4 

P 

1.4a 

.9192 

-1.6 

|j. - 1.6a 

.0548 

1.5 

P 

1.5a 

.9332 

-1.5 

}j. - 1,5a 

.0668 

1.6 

P 

1.6a 

.9452 

-1.4 

y - 1.4a 

.0808 

1.7 

P + 

1,7a 

.9554 

-1.3 

p. - 1.3a 

.0968 

1,8 

P 

1.8a 

.9641 

-1.2 

y - 1.2a 

.1151 

1.9 

P + 

1.9a 

.9713 

-1.1 

y - 1.1a 

.1357 

2.0 

P + 

2.0a 

.9773 

-1.0 

y - 1.0a 

.1587 

2.1 

P + 

2.1a 

.9821 

-.9 

p - ,9a 

.1841 

2.2 

P + 

2.2a 

.9861 

-.8 

p - .8a 

,2119 

2.3 

P + 

2.3a 

.9893 

-.7 

p - . 7a 

.2420 

2.4 

P + 

2.4a 

.9918 

-.6 

P - ,6a 

.2742 

2.5 

P + 

2.5a 

.9938 

-.5 

p - ,5a 

.3085 

2.6 

P 

2.6a 

.9953 

-.4 

p - .4a 

.3446 

2.7 

P + 

2.7a 

,9965 

-.3 

p - .3a 

.3821 

2.8 

P 

2,8a 

.9974 

-.2 

p - .2a 

.4207 

2.9 

P + 

2.9a 

.9981 

-.1 

p - .la 

.4602 

3.0 

P + 

3.0a 

.9987 

0 


.5000 
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X ranges from ^-3a to fi+3a. Values of ^*0^ values of x ranging from ia-3a 

to ji+3a by intervals of ,1a are given in Table 8,1. 

It will be convenient for applications of the normal distribution to 
write = 2 (orx = |i + za) and consider values of z corresponding to any 
chosen value of x and vice versa. A column of values of z,for the various values 
of x considered, is given in Table 8,1. 

If we think of X as a chance quantity with mean p and standard devia¬ 
tion a, then Z (where Z = ) is a chance quantity having mean 0 and standard 

deviation 1. As a matter of fact, this statement is true whether X (and Z) 
h^ve normal distributions or not. For example, suppose X is a chance quantity 
denoting the total number of heads obtained in throwing 10 coins. Then we know 
from formula (6.5) that the mean of X is 5 (i.e., p = 5)jand from formula (6,7) 

I 1 1 

that the standard deviation of X is^lO*^*-^ = 1,581 ( 1 , 0 ., a = 1.581), If we 
X-5 

set Z = --- " gY , then Z will be a Ohance quantity having mean 0 and standard de¬ 
viation 1. In this example, X (and also Z) is a discrete chance quantity, but 
its cumulative probability graph can be closely approximated by a cumulative 
normal distribution, as we shall see in Section 8,22. 

It should be noticed from Table 8.1 that the values of corre¬ 

sponding to p za are symmetrical with respect to ,5000 and the sum of the two 
values of F^(x) is 1, For example, for x = p 1,5a (i,e,, for z = 1.5) we 

have F^(x) = ,5000 +_ .4332, The sum of these two values of is clearly 1, 

If a chance quantity X is known to have a normal distribution with a 
specified mean p and standard deviation a, then one can determine the probabil¬ 
ity of X falling into a given interval from Table 8,1, 

Example ; Suppose X is a chance quantity having a normal distribution with mean 
30 and standard deviation 5, What is the probability that 26 < X 40 ? 

We have 

Pr(26 < X < 40) = - ^jj^26) . 

To find the values of Fj^(40) and F^(26), we must make use of the relationship 
between x and z for this problem. Since p = 30 and a = 5, we have 

x-30 


z = 


5 
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The values of z corresponding to 40 and 26 are —g— * 2.0 and —-— = -.8 . 

The values of F„(40) and F„(26) are given by entering Table 8.1 for z « 2,0 
N N 

and z == -.8, respectively. We find, therefore, -that 

Fjj(40) = .9773, F^(26) = .2119 . 

Hence we have 

Pr(26 < X < 40) = Pr(-..8 < Z < 2.0) = .9773 - .2119 = .7654 . 

In applying the normal distribution, it is often convenient to talk 
about the probability of X departing not more than (or departing more than) by 
a specified multiple of a from the mean p. For example, what is the probabil¬ 
ity that X will differ from p by not more than a ? By this we means What is the 
value of Pr(jj. ~cr<X<p.'*‘a)? Or more briefly expressed; What is the value 
of Pr(|X-^|<a) ? We have 

Pr(p - ^ Pr(-1 <3<l) 

= .8413 - .1587 = .6826 , 

or more briefly 

Pr(lX - p 1 < a) = .6826 . 

Similarly, we see from Table 8,1 

Pr(|X - p 1 < 2a) - Pr(p - 2a < X < p + 2a) = Pr(-2 < Z < 2) * .9546 

and 

Pr(|X - p| < 3a) = Pr(p - 3a < X < p + 3a) - Pr(-3 < Z < 3) = .9974 , 
Expressed verbally and statistically we may say this: 

If the measurements in a population are normally distributed with mean p and 
standard deviation a , then 

68.26^ of the measurements deviate less than la from p , 

95.46^ of the measurements deviate less than 2a from p , 


99.74?^ of the measurements deviate less than 3a from p , 
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This is a more precise form of the statements we made in Section 3.2, 
The graph of is shown in Figure 8.1, with the scales of both x 

and z indicated. For oon-venience, we place the z scale on a line slightly be¬ 
low the X scale. In applications, the x values which are actually marked off 
on the X axis are conveniently chosen values which are not necessarily those 
shown in Figure 8.1, but it is usually convenient to show at least the follow¬ 
ing values of z: -3, -2, -1, 0, 1, 2, 3. 



Graph of the Cumulative Normal Distribution 
Figure 8.1 

The graph of the probability density function as given by (8.4) 

is shown in Figure 8.2. The relationship between the graph of f j^(x) in Figure 
8.2 and that of Fjj(x) in Figure 8.1 is exactly the same as that between the 
graph in Figure 5.8 and the graph in Figure 5.6. To repeat: the numerical 
value of the ordinate of the graph in Figure 8.1 at any value, say x’, is equal 
to the numerical value of the area under the curve in Figure 8.2 to the left 
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of_x’ ( the shaded area) . 

Actually, we shall make very little direct use of the probability 
density function fjj(x). Figure 8,2 is given merely to show how the graph of 
f^(x) looks. In practical applications of the normal distribution it is always 
more convenient to work with the cumulative normal distribution F^(x). 



-3-2-1 0 1 2 3 z 

Graph of the Normal Probabili~^ Density Function 


Figure 8.2 

8•2 Some Applications of the Normal Distribution . 

The normal probability distribution is very widely used in probabil¬ 
ity and statistics problems. It is used, mostly for the following purposes: 

(1) To approximate or "fit” a distribution of measurements in 
a sample under certain conditions, 

(2) To approximate the binomial distribution and other discrete 
or continuous probability distributions under suitable con¬ 
ditions , 

(3) To approximate the distribution of means and certain other 
quantities calculated from samples, especially large samples. 

In the present section we shall discuss (l) and (2). „ But (3) must 
be reserved for discussion in Chapter 9. 

8,21. "Fitting" a cvmujativc distribution of measurements in a sample b y a 
cumulative normal distribution . 

It is very often true that samples cf measurements are such that their 
cum.ulative frequency polygons (in case of grouped data) or cumulative graphs (for 




Cumulative Frequency 


80 



" Fitted” Cumulative Normal Distribution Graph for the 
CumulatiTe Polyp;on of Figure 2,4 

Figure 3,5 
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ungrouped data) hare a shape which can be fair 15 ^ closely approximated by the 
graph of the cumulative normal distribution one obtains by replacing p in ( 8 ,l) 
by the sample mean X and a in (8.1) by the sample standard deviation s^. For 
example, let us consider the cumulative polygon shown in Figure 2,4, which is 
plotted from the cumulative frequencies in Table 2,2. You will recall from 
Section 3,31 that 

X = 1.527 


s = .101 . 

X 


Using these values for p and cr respectively, we have the follovring relationship 
between z and x: 


(8.5) 


z 


X- 1,527 

.101 


In ’’fitting" the cumulative normal distribution, it will be sufficient 
for our purpose to consider the following val.ues of z: -3, -2, -1, 0, 1, 2, 3. 
The values of x corresponding to those values of z as determined from formula 
( 6 . 5 ) and the values of Q-re given in Table 8,2, You will note that x is 

measured in ounces, but that z is a "dimensionless" number. 


TABLE 8,2 


Values of the Cumulative Normal Distribution Fjj(x) 
when Fitted to the Cumulative Polygon in Figure 2,4 


z 

X 

F^x) 

-3 

1.224 

. 0013 

-2 

1.325 

.0227 

-1 

1.426 

.1587 

0 

1,527 

.5000 

1 

1.628 

.8413 

2 

1.729 

,9773 

3 

1,830 

.9987 


Plotting the values of F^(x) shov/n in Table 8.2 and drawing a smooth curve 
thircugh them, we obtain the "fitted" cumulative normal distribution graph shown 
in Figure 8.3, The cumulative polygon is also shovni. 
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If one Tmnted to do a more accurate job of constructing the "fitted” 
graph, he would use more closely spaced values of z and, of course, more of 
them; e.g., z = -3, -2.5, -2, -1,5, ..., 2.5, 3, 

The degree of "goodness of fit" depends, of course, on how near the 
"fitted" curve and cumulative polygon are to each other. By reading ordinates 
off the "fitted" curve at upper boundaries of cells, one obtains "fitted" 
cumulative frequencies. For example, we find from Figure 8.3 the "observed" 
cumulative frequencies are those shown in Table 8,3 (reading to the nearest ,5), 


TABLE 8.5 


Comparison of Observed and Fitted Frequencies 
from Figure 8.3 


cell 

midpoint 

observed 

cumulative 

frequency 

"fitted" 

cumulative 

frequency 

observed 

frequency 

"fitted" 

frequency 

1.25 

0 

.5 

0 

,5 

1.30 

1 

1.5 

1 

1.0 

1,35 

6 

5.0 

5 

3.5 

1.40 

12 

11,5 

7 

6.5 

1.45 

25 

23.5 

13 

12.0 

1.50 

33 

37.5 

8 

14.0 

1,55 

50 

50.5 

17 

13.0 

1.60 

64 

63,0 

14 

12.5 

1.65 

71 

70.0 

7 

7,0 

1,70 

72 

73.5 

1 i 

3.5 

1.75 

75 

74.5 

3 ! 

1.0 

1,80 

75 

75.0 

0 

.5 

Total 



75 

75 


8,22 "Fitting" a cumulative binomial distribution by a cumulative normal distribution . 
In Chapter 7 we discussed the Poisson distribution as a simplified 
approximation to the binomial distribution when n is "large" and p is "smal].". 

We can use the normal distribution to very good advantage in approximating the 
binomial distribution under other conditions, particularly when p is not too 
"close" to 0 or 1, and when n is "large" or even "moderately Isirge", or roughly 




Sec, 8.22 


8, THE NORMA.L DISTRIBtiTION 




speaking, when np is at least 5. It turns out that for a specified value of p, 
no matter how small, the cumulative normal distrihution provides an approxima¬ 
tion to the cumulative binomial distribution which gets better and better as n 
increases, the approximation becoming perfect in the limit as n increases in¬ 
definitely o 

Now how do we actually uSe the normal distribution to approximate the 
binomial distribution? As mentioned earlier, we approximate the citm.ulative bi¬ 
nomial distribution by means of the cumulative normal distribution . Or graphic¬ 
ally expressed; we approximate the cumulative binomial , probability graph 
step-like graph of the type shown in Figure 5,S ) by the cumulative normal 
tribution graph , ( a smooth curve of the type shovm in Figure 8,1 ). 

The binomial probability distribution'is (6.3), and its cumulative 
distribution Fg(x) for any value x’ is given by 


( 8 . 6 ) 


„ / , N ^n 0 n^ -n 1 

Pg(x') =Cq p q + p q 


1 n-1 


^n [x»] n-[x’] 


where [x’] is the largest integer which does not exceed x‘ . 

You will remem.ber from Chapter 6 that the mean and standard deviation 
of the binomial distribution (6,3) are np and x/hpq respectively. The cumulative 
normal distribution F (x) which approximates the cumulative binomial distribu- 
tion given by (8.6) is given by making the following substitutions in (8,1); 
p = np, and a = s/npq , More explicitly, the approximating cumulative normal 
distribution function F^(x) is such that, for a given value x’ , 


(8.7) 


V- 


\/2n N/npq* 


X’ - 

X: 


2 npq 


<x-np)' 


dx 


In other words, F.hx') is an approximation to FAx'), But the real 
question is this; How good is this approximation? To give a full mathematical 
discussion to this question is a difficult matter which is beyond the scope of 
this course. At this point it may be sufficient to give two exauLples and to 
m.ake this statemient; For any specified value of p and x*, the difference between 
F^(x*) an^Fg(x’) approaches zero as n increases indefinitely . 

Example 1 : Let us fit a cumulative normal distribution to the cumulative binora- 

1 

ia] distribution in the case in which n = 10, p = ~ , 
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putting n 
(8.8) 

for X - 0, 


We calculate the tinoniial probabilities from the formula. (6.3) by 
1 

= 10, p = ^ i.e., from 

e(x) = 

1, 2, 3, 4,_ ..., 10. You will remember that these are the probabil 


ities of getting 0, 1, 2, 3, 4, ... 10 heads in tossing 10 coins. We get the 
probabilit^T- distribution f*g(x) and cumulative probability distribution 
shoYm in Table 8.4. Now we need to calculate values of the cuinulsLtive normal 
distribution for several values of z. In this problem we shall consider all 
values of z from. -3 to +3 by intervals of .5. For values of p and a wo have 


p = np 


a =N/npq =\/l 0 ‘j ''2 “ ^*581 . 
The relationship between x and z is given by 


x~5 

1.581 


Values of z, x and are given in Table 8,5. 

The graphs of tlie cumulative binor.'.ial distrioution F„(x, 


ana une cujiiu. 


tive normal distribution Fj^(.x) are s.ViOwn in Figure (8,4)for the case n = 10, P ” 

TABLE 8,4 

Binomial Distribution and Cumulative Binomial 

1 

Distribution for n = 10, P = ~ 


X 

fB ( x ) 

Fb ( x ) 

0 

,001 

,001 

1 

.010 

.011 

2 

.044 

.055 

3 

.117 

.172 

4 

.205 

.377 

5 

.246 

.623 

6 

.205 

.828 

7 

.117 

,945 

8 

.044 

,989 

9 

.010 

,999 

10 

.001 

1.000 

Total 

I 
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To actually carry out the arithmetical work of "fitting” we would need tables 
similar to Table 6.4 and Table 8,5. To avoid unnecessary repetition, we shall 
not write down the tables here. We simply present the results graphically in 
Figure 8,5 (which is similar to Table 8,4), The "fit” is seen to be very 
satisfactory. 

One may easily ask what point there is in fitting a cumulative bi¬ 
nomial distribution by means of a cumulative normal distribution. The answer 




Cumulative Probability^ 



-3-2-1 0 1 2 3 '" z 


Graphs of and of F^^(x) for the case 

n = 50, p = *1 

Fip~.ure 8,5 
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is that it is easier to calculate probabilities by using a cumulative normal 
distribution than by using a binomial distribution. For instance, to approxi¬ 
mate the value of 

10 1 

(8.8) fgC.) = (-) 

for a single value of x, say x = 6, by using the normal approximation, we would 
proceed as follows; referring to Figure 8.4, we note that f_(6), which is the 

D 

size of the jump in the step-like graph of the curr.ulative binomial distribution, 
is approximately equal to the difference F^(6.5) - F^(5.5), i.e,, the difference 
between the ordinates of the fitted cumulative normal distribution at x = 5.5 
and 6.5. This gives us the approximate result Tg(5) = Fjj(6<,5) - F^(5.5). 

To approximate the value of a sum of terms of the binomial distribu¬ 
tion (.8.8), e.g., for x = 4, 5, 6, we would have 

fg(4) + fg(5) + fg(6) 

a.pproximately equals 

[FgU.5) -Fjj(3.5)J + [Fjj(6.6) -Fjj( 4.5)] tPjjC6,5) -Fjj(5.5)] 

= Fjj(6.5) -Fjj(3.5) . 

1 

In this example n = 10, p=='^, ji=5, o - 1.581, The values of z correspond- 

^ 6 5-5 3 5-5 

ing to X = 6.5 and x = 3,5 are = .95, and * -,95, respectively. 

Since Table 8,1 gives values of z to only one decimal place, we interpolate to 

obtain approximate values of Fj^(6,5) and Fjj(3,5). We find Pjj(3,5) * .1714 

(actually F^(3.6) « ,1711) and Fjj(6,5) = .8286, and hence we obtain F^(6.5) - 

F„(3,5) = ,657. The exact value of f_(4) + f„(5) + f_(6) from Table 8,4 is 
•H " B B B 

seen to be ,205 + .246 + .205 = .656, which is close to the approximate value 
,657, obtained by taking the difference Fjj(6,5) - Fjj.(3,5), With sufficiently 
extensive tables of values of z, one can easily and rapidly find the approxi¬ 
mate value of the sum of any number of consecutive terms of a binomial distri¬ 
bution by simply takir:g a difference of the fitted cumulative normal dis¬ 
tribution for two values of x, assuming, of coiirse, that the conditions of 
satisfactory approximation are satisfied. 

To get some notion of the gain in accuracy one obtains by evaluating 
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Fjj(x) at points halfway between integers, let us consider the case of n = 400 
and p = ~ . For instance, if one wanted to obtain an approximation to the pro¬ 
bability that if a coin is tossed 400 times, the number of heads will lie be¬ 


tween 196 and 205 inclusive, we might use Fj^(205) 


Here |i = np = 200 


,..y 




- F^(195) as an approximation, 
and the two values of z 


corresponding to x = 195 and 205 are -,5 and +.5, respectively. Hence, the 
approximate value of the desired probability is .6915 - .3085 =' ,3830. The 
closer approximation, using points halfway between integers, would be given by 
F^(205,5) - P^(194,5). The values of z would be -.55 and +.55 and the approxi¬ 
mate probability would be ,7088 - .2912 = ,4176. The difference between the 
two approximations is ,4176 -.3830 = .035. The difference 7/ould be smaller if 
the approximations involved evaluations of F^(x) for values of x further away 
from the mean 200 than those just considered, i.e., further from 200 than 196 
or 205, In general, unless n is larger than about 400, the accuracy gained by 
using half-integer positions is worthwhile. 


8 .3 The Cumulative Normal Distribution on Probability Oraph Paper , 

You wrill recall from Section 2.3 that’reference was made to probabil¬ 
ity graph paper which had the property of making cumulative polygons appear as 
approximately straight lines. This is accomplished by the way in which the per¬ 
centage scales for low percentages and high percentages are stretched.. Now if 
the cumulative normal, distribution Fjj(x) is graphed on probability pa p er, the 
graph obtained is exactly a straight line . Figure 8.6 shows how the graph of 
the cumulative normal distribution Fjj(x) (as graphed in Figure 8,1) becomes a 
straight line on probability graph paper. 

Notice that the 50th percentile of the cumulative normal distribution 
is (i, the mean. The 84th (more precisely the 84,13th) percentile is p + a. 

Thus a is the difference between the 84th percentile and the 50th percentile. 
Thus if one were to draw any straight line through the center of a sheet of 
probability graph paper in the direction from lower left to upper right so that 
it would intersect the upper and lower edges of the graph paper, one would have 
a graph of a cumulative normal distribution. If the scale for X has been laid 
off, one could determine the value of p graphically by getting the 50th percen¬ 
tile, and the value of a graphically by taking the difference between the 84th 


*i. 





hrequency 
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and 50th percentiles. This procedure is very useful in quickly making rough 
^estimates” of the mean and standard deTiation of a sample distribution of 
measurements, when the distribution is "approximately" normal. 

As an example, suppose we return to Figure 2.5 and draw "by eye" a 
straight line that seems to "fit" the polygon satisfactorily. A good way to 
"fit" such a line is to adjust a stretched black thread by trial and error until 
the fit looks "reasonable", and then mark off a point through which the string 
passes near each end of the cumulati-ve polygon. Then draw a straight line 
through the two points. Performing this operation, we get Figure 8,7, Taking 
the 60th and 84th percentiles with respect to the straight line, we find 1,65 
and 1,63, These are the rough graphical estimates of X and X+s^, respectively. 
Thus the probability graph paper estimates of X and s^ are 1.53 and .1, re¬ 
spectively, which are to be compared with the arithmetically determined values 
1.^27 and .101, respectively. There are ways of ’’fitting" straight lines which 
are mathematically more "precise" than the one just described. One of these, 
the method of "least squares", will be discussed in Chapter 13, 

Exercise 8 , 

1, If X is a chancb quantity having a normal distribution with p = 13 and 
a = 4, find the value of the probability 

(a) that X < 20. 

(b) that 10 < X. 

(c) that 10 < X < 20, 

(d) that X differs from 13 by more than 6. 

2o The scores made by candidates on the Scholastic Aptitude Test of the College 
Entrance Examination Board are normally distributed with mean 500 and standard 
deviation 100, What percent of the candidates receive scores 

(a) exceeding 700? 

(b) less than. 400? 

(c) between 400 and 600? 

(d) which differ from 500 by more than 150 points? 

(e) If a candidate gets a score of 680, what percent 
of the candidates have higher scores than he? 

3, Fit a cumulative normal distribution to the data in problem No, 6 of Exer¬ 
cise 2.2, Graph the cumulative normal distribution you obtain, and also the 
cumulative polygon (a) on ordinary graph paper and (b) on probability graph 


paper 
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4, Pit a normal distribution to the data in problem No, 7 of Exercise E,2, 

Graph the cumulative normal distribution you obtain, and also the cumulative 
polygon (a) on ordinary graph paper and (b) on probability graph paper, 

5o Fit a cumulative norm.al distribution to the'data in problem No, 4 of Exer¬ 
cise 3.3 0 Gra.ph the cumulative normal distribution you obtain as well as the 
cumulative polygon (a) on ordinary graph paper and (b) on probability graph 
paper, 

6, You are supposed to have done at least one of the following problems in 
Exercise 2,2; Nos. 8, 9, 10, 11, 12, 13, 14, Fit a normal distribution to 

the distribution (or distribution.']) of sample data on measurements you obtained. 
Graph the fitted cumulative normal distribution and the cumulative polygon (a) 
on ordinary graph paper, and (b) on probability graph paper. 

7, A sack of 400 nickels is emptied onto a table. Using a cumulative normal 
distribution approximate the probability that: 

(a) more than 250 heads will turn up, 

(b) the number of heads will be less than 190, 

(c) the number of heads Yrill lie between 170 and 

230 inclusive. 

8, a die is rolled 720 times. Using an approximating cumulative normal dis¬ 
tribution, estimate the probability that: 

(a) more than 130 "sixes" will turn up, 

(b) the number of "sixes" obtained will lie between 
100 and 140 inclusive, 

9, It is known that the probability of dealing a bridge hand with at least one 
ace is approximately ,7, If a person plays 100 hands of bridge, what is the 
approximate probability that he will receive at most 20 hands which will contain 
no aces? 

10, Fit a cumulative normal distribution to the cumulative binomial distribu¬ 
tion for the case in which n = 5, p = ,4,* Graph both cumulative distributions 
on ordinary graph paper, 

11, Let X be a (discrete) chance quantity denoting the total number of dots 
obtained in throwing three dice. Work out the probability distribution of X, 
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Fit a cuniulative normal distribution to the cumulative distribution of X, Graph 
the two cumulative distributions on ordinary graph paper. From each cum.ulative 
distribution, calculate the probability that the total number of dots will be 
(a) at least 12 , (b) 8, 9, 10, 11 or 12. 

12, Fit the cumulative probability distribution of the discrete chance quantity 
X in problem No, 6 of Exercise 5,1 by a cumulative normal distribution, and graph 
the two cumulative distributions, 

13, Fit a normal distribution to the cumulative probability distribution of the 
continuous chance quantity X in problem No. 3 of Exercise 5.3. 


CHAPTER 9. ELEMENTS OF SAMPLING ; 

^^ Introductory Remarks « 

In Chapter 4 we mentioned that there are two approaches to t?ie study 
of sample-to-saraple fluctuations of sample statistics: experimental and mtathe- 
matical « The mathematical approach is founded on the theory of probability and 
we shall find that the normal distribution plays a very important role in it. 

There are two types of sampling which we shall consider: (l) sam.pling 
from a finite population , and (2) sampling from an indefinitely large population . 

Each of these two types of sampling has an experi.mental and a mathe¬ 
matical or theoretical aspect. We shall discuss the two types of sampling in 
turn, 

9 .2 Sampling from a Finite Population . 

First let us consider a very simple example and illustrate what we 
mean by experiment al sampling and what we mean by mathematical or theoretical 
sampling from a finite population. 

9,21 Experimental sampling from a finite population . 

Suppose we have 6 chips, marked with the numbers 1, 2, 3, 4, 5, 6, 
respectively, and placed in a bowl. The composition of this bowl of ciiips may 
be described by Table 9.1. If we stir these chips thoroughly and draw out 3 
chips simultaneously (or one after another without replacement until 3 chips 
are drawn) we are experimentally drawing a sample of 5 chips from the finite 
population of 6 chips having the following distribution of values of X; 


TABLE 9.1 


Composition of Bowl of Six Chips 


x 

frequency 

relative frequency 
(probability) 

1. 

1 

1/6 

2 

1 

1/6 

3 

1 

1/6 

4 

1 

1/6 

5 

1 

1/6 

6 

1 

1/6 

Total 

6 

1 


165 




166 


9. ELEI^NTS OF SAMPLING 


Sec. 9.21 


If we put the 3 chips hack in the howl and stir them with the others, we can 
repeat the process again and again as many times as we please. Every time we 
repeat the process yte get an experimental sample of 3 chips, Now suppose we 
are interested in the mean X of the numbers on the three chips in a sample. 

In drawing 100 experimental samples, let us saj^, from the howl we would get 
100 means, ranging betv/een 2 and 5, In an actual experiment of drawing 100 
small samples, the distribution of values of sample means X obtained is given 
in Table 9.2, A column showing the sample sum S(X) is also given. The rela¬ 
tion betw^een X and S(X) in this example is 3X = S(X). 

TABLE 9,2 

Frequency Distribution of X (and S(x)) in 100 Experimentally Drawn 
Samples of 3 Chips from the Bovd with Composition Given in Table 9,1 


S(x) 

X 

Frequency 

Relative Frequency 

6 

2,00 

6 

.06 

7 

2.33 

6 

.06 

8 

2.67 

8 

.08 

9 

3.00 

17 

.17 

10 

3,33 

18 

.18 

11 

3.67 

12 

.12 

12 

4,00 

14 

.14 

13 

4.33 

8 

.08 

14 

4.67 

5 

.05 

15 

5,00 

6 

,06 

Total 


100 

o 

o 

•H 


The second and fourth colum.ns of Table 9,2 constitute the experimental 
sampling distribution of means X in the set of 100 experimentally dravm samples 
three chips from the given population of six chips , (Similarly the first and 
f(:)U.rth co.iumns constitute the experimental sam.pling distribution of saroplb sums 
S(X),) We could consider other statistics than the mean X or sum S(X) calcula¬ 
ted from the successive samples, e<,g., the standard deviation, the range, 
largest value, median, smallest value, etc. These .statistics all have experi- 
m.ental sampling distributions among the 100 samples too. But, tlie mean is an e- 
specially simple statistic to deal with and we v/ill stick to it in most of our 
discussion of samiplingo 

If another 100 samples were drawn one would get a slightly different 
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frequency distribution or relative frequency distribution of values of X (or 
S(X)). If a larger number of samples were drai^-n, say 1000, one would find that 
the relative frequencies would be more "stable” from, one set of 1000 samples to 
another, than from one set of 100 samples to another. But the distribution of 
X for 100 samples as shown in Table Go2 gives a good idea of how means of samples 
of 3 from the given population of 6 chips vary. One can describe the distribu¬ 
tion in Table 9.2 graphically by the methods of Chapter 2, or arithmetical'Jy by 
finding the mean X and standard deviation s~ of the sample means. In fact, we 
have 


(9.:]) 


X = 3.46 


.79 . 


We give these values because we shall want to compare them with theoretical 
values to be worked out in the following paragraphs. 

The idea of experimental sarr.pling from a finite popi.Jation, as illus¬ 
trated by the foregoing simple exa.mple, can clearly be extended to dravlng 
samples of n "elements" from, a finite population of N "elem.ents". 

Let us nov7 turn to theoretical considerations and shov; how we can 
make predictions as to what will actually happen in expermental semi.p.l.ing from, 
a finite populatioiio 


9,22 Theoretical sampling from a finite population o 

Let us return to the example of a finite population of six -diips 
m.arked 1, 2, 3, 4, 5, 6, respectively. The num.bur of samples of three- -■'hips 
it is possible to draw out of this population of six chips is simply the number 
of combinations of 6 objects, taken 3 at a time, i.e., 20„ These 20 sam¬ 

ples are as follows; 


1, 2, 3 1, 3, 5 

1, 2, 4 2, 3, 4 

1, 2, 5 1, 3, 6 

1, 3, 4 1, 4, 5 

1, 2, 6 2, 3, 5 


1, 4, 6 3, 4, 5 

2, 3, 6 2, 5, 6 

2, 4, 5 3, 4, 6 

1, 5, 6 3, 5, 6 

2, 4, 6 4, 5, 6 . 


The frequency distribution, of the sum S(X) and m.ean X In this set of 
possible samples is given in Table 9.3. The second and fourth ccliimns in 
Table 9,3 constitute the theoretical sampling distribution of the m.ean X of 


samples of three c hi ps from, the finite population of six chips . 
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TABLE 9.5 

Frequency Distribution of X and S(X) Amonp; the 20 Possible Samples 
cf 5 Chips froji! the Bowl with Composition Given in Table 9.1 


S(x) 

X 

Frequency 

Probability 

6 

2.00 

1 

. 05 

7 

2.35 

1 

,05 

8 

2.67 

2 

.10 

9 

3,00 

3 

.15 

10 

3.33 

3 

.15 

11 

co 

CO 

3 

.15 

12 

4.00 

3 

.15 

13 

4.33 

2 

.10 

14 

4.67 

1 

.05 

15 

5.00 

1 

.05 

Total 


20 

1 


(Similarly the first and fourth columns constitute the theoretical 
sampling distribution of the sum S(X).) 

The theoretical, sampling distribution in Table 9.3 is really nothing 
but a probability distribution. It is a prediction of the distribution one 
v/ould get in experimental sampling by drawing a larger and larger number nf 
samples of 3. The accuracy of the prediction for the 100 experimental samples 
we actually drew may be seen by com.paring column 4 of Table 9.2 with column 
4 of Table 9,3. as we pointed out before, if we had dravm another 100 samples 
we would, in general, get a slightly different relative frequency distribution, 
but not ’’drastically different” from the theoretical distribution in column 4 
of Table 9,3. If we had drawn 1000 samples we would ’’almost certainly” get a 
relative frequency disbribution closer to the theoretical distribution in 
column 4 of Table 9.3, assuming "thorough” stirring after each sample. 

Now we can get the mean and standard deviation o.f the theo¬ 
retical sampling distribution of X in Table 9.3 by the usual formulas (5.1) 
and (5,3) for the mean and standard deviation of a probability distribution. 

We have 


(9.2) 


== 3.5 



.7638 . 
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theoretical values computed from Table 9,3 are seen to predict closely 

. tm 

experimental values as in (9.1), X = 3.46 and s- = .79 respectively, 

X 

computed from Table 9,2, 


We can also find from Table 9,3 the values of 
standard deviation aof the sample sums S(X) in the 
from the population. They are 


the meanpg^^^ and 
20 possible samples 


(9.3) 


‘"s(x) 


10.5 



which, of course, can also be-found directly from prr and a~ . For since 

■A. jx 

3X = S(X), we have 3 p- = and 

If we should consider samples of one chip from our population of six 
chips, there would be six possible samples. The theoretical sampling distribu¬ 
tion in this case is given by the first and third columns of Table 9.1; it is 
the same as the distribution of values of X in the population itself. This 
distribution has its own mean p and standard deviation ct having values 

(9.4) p = 3.5 



In other words, (9,4) gives the values of the mean and standard,deviation of 

the population of chips . Expression (9.2) gives the values of the mean and 

standard deviation of the distribution of the means of all possible samples 

of three chips from this population. It will be seen that p~ = p, and = 

X X 

(X) ° 

discussing a very simple case of theoretical sampling from a finite population. 
Let us consider the general case, 

9,23 The mean and standard deviation of means of all possible samples from a 
finite population . 

The real question here is this: What are the relationships between 

p~ and p and between a-— and a for more general theoretical sampling from 
X X 

finite populations? 


a/^/T, (Also 


^s(x) 


3p and o 


3o/ ) But, of course, v«re have been 
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To answer this question, suppose we have a finite population of N 
"elements”, each "element" (e.g., a chip) having a number belonging to it, so 
that there will be some distribution of these numbers. (The numbers do not have 
to be integers 1, 2, 3, N; any "element" can have any number belonging to 

it, provided all "elements" do not have the same number on them. In such a 
case we would not really have a distribution of numbers.) Suppose this distri¬ 
bution has mean p and standard deviation a. Now consider all possible samples 
of n "elements" from this population. There are such samples. The theo¬ 
retical sampling distribution of therficans X of these samples has the following 
mean and standard deviation respectively: 




a /N-n 


>/h'MN-l 


The first formula states that the mean of the means of all possible samples of 
n elements is equal to the popu l ation mean . The second fomula states that the 
standard deviation of means of all possible samples of n elements is equal to 
the population standard deviation times the factor . 

The theoretical sampling distribution of the sample sums S(X) has the 
following mean and standard deviation respectively: 


^ - /— /w-n 

""sex) 


This may be seen at once from the fact that p 


nt.- , and 


Note that if the number of elements N in the population is very large 


compared with the number of elements n in the sample then the factor VN-1 
nearly 1, and the value of reducos to approximately, and reduces 

to a \Jn approximately. 

Before we discuss the derivation of formulas (9,5), let us consider 


the following 
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Example : Suppose we remove all face cards from a deck of playing cards. Of the 
40 remaining cards, 4 will be marked 1, 4 will be marked 2, o.., 4 will be marked 
10, What is the mean and standard deviation of the theoretical sampling distri¬ 
bution of means of all possible samples of 10 cards from this "population” of 40 
cards? 

If X denotes the number on a card, the distribution of X in the popu¬ 
lation of 40 cards is shown in Table 9,4. 

TABLE 9.4 


Distribution of the Numbers on the 40 Non-Face 
Cards in a Deck of Playing Cards 


X 

Frequency 

f(x) 

Relative Frequency 
(Probability) 

1 

4 

.1 

2 

4 

.1 

3 

4 

.1 

4 

4 

.1 

5 

4 

.1 

6 


.1 

7 

4 

.1 

8 

4 

.1 

9 

4 

.1 

10 

4 

.1 

Total 

40 

1.0 


The mean and standard deviation of this distribution are 

p - 5.5 

a = /sF . 

The values of N and n are 40 and 10 respectively. Therefore, we have for the 
mean and standard deviation of the distribution of sample means 
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_ /~55~ /4Q~10 ^ fzz 

^ X V 40-1 “V 13 * 

respectively. For the mean and standard deviation of sample s\mis we would have 


■^S(X) 


‘^S(X) 13 • 


The derivation of the formula = p in (9,5) proceeds as follows* 

A. 

Let x^, Xg, Xjj be the numbers belonging to the N elements in the popula¬ 
tion. We may think of (x^, x^, ..., x^_^, x^), (x^, Xg, ..., ^n+1^' 

and finally (x^^ n+1’ ^N n+2^ “ *' * ^N^ ^n 

mathematical expectation of the mean of the means of all these samples. Since 

1 

the probability for each sample is —^ , this means that is the mean of all 

X 

N ^ 

sample means. Hence, p— is the sum of all sample means, i.e., 

(9.7) 


• * • n^^N-n+1 ^N-n+2 "^ * * * "^ * 


Now each one of the x’s in the population will occur in ^ of the samples. 

For if a particular x is selected, a sample of n elements containing that x can 

N-1 

be formed by selecting n-1 other x's. There are ^ ways of selecting the 
other n-1 x’s. Hence we can rewrite (9,7) as follows by collecting the x^’s. 


the Xg’s, etc, 


^N 1 _N-1 ^ 1 .N-1 

Cp:r=“*C .X, + —C ^x„ 
n^X n n-1 1 n n-1 2 


i X 
n n-1 N 


Now if we divide both sides of (9,8) by and note that ~ , we have 

^ n n-1 n N 


¥ (^1 ^ ^2 




(9.9) 
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But X + ... + x„) is the mean of the x’s in the population and we haTe 


N' 1 2 

denoted this by p. Therefore 


(9,10) 


which establishes the first formula of (9.5)» 

2 

The derivation of the variance a— is a little more involved, but 

will be given for the benefit of those who wish to follow it through. To obtain 
2 

the value of we use the definition of the variance of a probability dis- 

A 

2 

tribution given in formula (5o3b), This means that o’— is equal to the mean 

A 

of the squares of means of all possible samples minus the square of (the 

A 

mean of means of all samples), or written more briefly 

(9.11) 'i* frw"-*vi*vi] ■> 


-e 


X +x + 

n N-n+1 N-n+2 


•••v] ■ 


When the quantity in each square bracket is squared, we find the sum of squares 
of all x's in that bracket plus the sum of twice the product of all possible 
pairs of x’s in that square bracket. But we know, by the same argument given 
in arriving at formula (9,10), that the square of any given x, say x^, occurs in 

a total of ^ samples. By the same method of reasoning any particular product 

N-2 

of x's, e.g,, XX, occurs in C „ samples. Hence, by squaring out the terms 
j. n~ o 

in (9,11) and collecting, we have 


(9.12) 


1 1 „N-1 2^ 2 

n C ^ 

n 


2 H n-2 L 




i] - 


But from (9,9) we have 
(9.13) 


2 1 
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from which the value of a~ in (9,5) may be obtained by taking square roots, 

9,24 Approximation of distribution of sample means by normal distribution . 

N 

You will recall that there are possible samples of n elements in 
a population of N elements. Starting with a fairly large population and con¬ 
sidering samples even of moderate size, it would be a hopeless task to compute 
a table similar to Table 9,3 showing the probabilities in the distribution of 
sample means. But we can calculate these probabilities approximately. Under 
a very wide set of conditions this distribution can be approximately fitted by 
a normal distribution. Roughly speaking these conditions are that n be fairly 
large (say 30 or greater) and N much larger (say 100 or greater). But if one 
were to examine the conditions more precisely one would find,in certain special 
cases, that the fit is very good for n as small as 10 and N as small as 25, 

For example, this would be true if the values of x in the population were equal¬ 
ly likely and equally spaced. On the other hand, the fit can be very bad for 
such small values of n and N, particularly if the distribution of values of 
x’s in the population is very lopsided or skewed , A full mathematical state¬ 
ment of the degree of accuracy with •'jvhich a cumulative theoretical sampling 
distribution of sample means can be approximated by a cumulative normal dis¬ 
tribution is far beyond the scope of this course. 

Now how is a normal distribution fitted to a distribution of sample 
means? Just as in Chapter 8, we would fit the cumulative normal distribution 
to the cumulative distribution of sample means. To carry out this fitting 
process, we would need only the values of p:— and a— and Table 8,1, and pro- 

A A 

ceed as we did in fitting the oijmulative normal distribution to a cumulative 
polygon or to a cumulative binomial distribution. 

The fitted distribution of X has the equation 



-00 


where the values of and a— are given by formulas (9.5), The table of 
A A 

fitted probabilities would be given by Table 8,1 by replacing p by p~ and a 

X 

by a~ and X by X , The graph of the fitted cumulative normal distribution 
would look like Figure 8,1, where p is replaced by p— and a by a-—. In 
any particular example we would have numerical values of the population size 
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N, the sample size n^, the population mean p, and the standard deviation a. 
Actually we would rarely carry out the actual process of fitting. We only use 
the fitted cumulative normal distribution to approximate any particular proba¬ 
bility in which we may be interested. In determining such a probability, it 
is not necessary to actually fit the complete distributiono An example will 
make this clear. 

Example ; The distribution of weights in a population of 1000 students has 
mean 148.2 lb,, and standard deviation 5,4 lb. If a sample of 100 students is 
picked ”at random" from this population, what is the probability that the mean 
weight of these 100 students will exceed 149 lbs^? 

We have N = 1000, n = 100, p = 148.2, and a = 5,4,. 


p~ = 148.2 


5.4 900 


= .513 


The relationship (see Section 8,1) between Z and X for this problem is 


2 - ^ 


Using the values of p'n and a~ found, we have 
A A 


X - 148.2 


Now our original question was thisj What is the probability that X will exceed 

— 149 - 148 2 

149? X will exceed 149 if and only if Z exceeds -^ ( = 1,56), We may 

write this briefly as follows ; 


Pr(X > 149) = Pr (2^ > = Pr(2 > 1.56; 


But Pr(Z > 1,56) = 1 - Pr(Z < 1,56). Looking in Table 8,1 and interpolating for 
z = 1,56, we find Pr(Z < 1.56) = .9406. Hence the answer to our question is 

Pr(X > 149) - 1-.940G - .059 . 


This means that approximately 5.9^0 of the possible sam.ples of 100 from the giver 
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population of 1000 students have means exceeding 149 lbs. 

In a similar manner one can find approximately the probabilities in 
the distribution of the sums S(X) in all possible samples. In this case the 
relation between Z and S(X) is 


"sex) 

where *^S(X) given in terms of N, n, p and a by formulas (9,6). 

Asking probability questions in terms of X is equivalent to asking 
probability questions in terms of S(X). One can express a probability de¬ 
pendent on X as a probability dependent on S(X), For instance in the example 
just given, would have the value 14820 lbs,, and value 

.4)xA'“ 71^ = 51.3 pounds, and the relation between 2 and S(X) would be 

7 S(X) - 14820 
^ 51,3 


(5 


From this relation one could asks What is the approximate probability that the 
total weight of 100 students would exceed 14,900 Ibse? The value of Z for S(X) = 


14,900 is 


14900 - 14820 
51.3 


80 

51,3 


1.56 


Thus 


= Pr(X > 149). 

Pr(S(X) > 14,900) = Pr(Z > 1.56) = .059 . 


In other words, the probability of the mean weight of- 100 students exceeding 
149 lbs., is exactly the same as the probability of the total weight of the 
100 students exceeding 14900 lbs. 


Exercise 9,2 . 

1. In Problem No. 1 of Exercise 2,1, suppose a sample of 2 students is picked 
at random. What is the probability that their average score would exceed 17? 
That the sum of their scores would lie between . 13 and 75 (excluding 13 and 75)? 


2. Suppose a population of 50,000 candidates takes the Scholastic Aptitude Test 
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and that the scores are scaled in such a imy that the mean of the scores in 
the population is 500 and the standard deviation is 100. If 25 candidates are 
picked at random from this population, what is the probability that their 
average score will be less than 475? Greater than 550? Between 475 and 525? 

3, If 45 percent of the 3600 Princeton undergraduates were the number who would 
answer "y®^" "to a- certain question, what is the approximate probability that in 
a random sample of 100 students a majority would answer "y®s” "to the question? 
(Hint; Each individual in the population can be considered as being "marked" 

1 or 0: 1 if he says yes, and 0 otherwise. Thus in drawing a person, X is a 

chance quantity which is equal to 1 if the person says "yes" and 0 otherwise. 

The distribution of X is therefore as follows: 


X 

frequency 

p(x) 

0 

1980 

.55 

1 

1620 

.45 

Total. 

3600 

1.00 


From this distribution you can easily find the value of p and a.) 

4. Suppose a lot containing 10,000 articles contains 20 percent defectives. 
What is the approximate probability that a random sample of 400 articles from 
this lot will contain more than 25^ defectives? 

5, Suppose 1000 chips, marked 1, 2, 3, 1000 respectively, are put in a 

bowl and mixed thoroughly. If 10 chips are drawn at random, approximately what 
is the probability that the sum of the numbers will exceed 6000? (For your 
information: 

1 * 2 *... k = 

F . 2^ * ... . F = .) 

6 

6o Consider a fictitious life insurance company which has an insurance policy 
on each of 100,000 persons, aged 65, and suppose that the distribution of 
size (x) of policies (in $1000*s) is as follows; 
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X 

" 1. 

"’ 2 "" 

. 3 "”"^” 

4 

’ " 5 “. 




9 

*"10 

" 15 "" 

20 

“5“"^ 

relative 

frequency 

.25 

.03 

.02 

.02 

.40 

.02 

.02 

.01 

.01 

.15 

.05 

.01 

0 


Suppose 1000 of these persons will die within a year. Approximately., whsit is 
the probability that the company will have to pay more than |6,000,000 in death 
claims to beneficiaries of these persons? 


9o3 Sampling from an Indefinitely Large Population , 

Just as in the case of sampling from a finite population, there are 
two approaches to the study of saraple-to-sample fluctuations of sample statistics, 
in samples from an indefinitely large population; experimental and mathematical . 
Experimental sanpling from an indefinitely large population is not essentially 
different from sampling from a finite population. It consists of actually carry¬ 
ing out experimental operations. The experiments may be less obviously sampling 
operations. For example, if a single die is thrown 10 times successively we may 
regard this as drawing a sample of 10 elements from an indefinitely large popu¬ 
lation of potential throws of the die. This may be repeated again and again as 
often as we like. If we stop with 100 samples, say, we could make a frequency 
distribution of X or S(X) analogous to Table 9.2. We shall not actually do this^ 
however, but shall base most of our discussion on theoretical sampling from an 
indefinitely large population. 

Conceptually, we may think of "drawing” such a sample as equivalent to 
drawing 10 chips out of a bowl containing indefinitely many chips, where each 
chip is marked 1, 2, 3, 4, 5 or 6. If the bowl of chips is to correspond to a 
"true” die, the proportion of chips wi.th each number marked on would be 1/6, 
and the mixing would be "thorough". Or, alternatively, we can think of putting 
six chips marked 1, 2, 3, 4, 5, 6 in the bowl, raixing thoroughly, and "drawing” 
one chip, returning it to the bowl, and repeating this process 10 times. Thus 
10 "drawings” of a single chip with replacement after each drawing is equivalent 
to "drawing” a sample of 10 chips from an indefinitely large population. 

9.31 Mean and standard deviation of theoretical distributions of means and sums 
of sampl es from an indefinitely large population . 
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Actually, it is theoretical sampling rather than experimental sampling 
from an indefinitely large population which is of greatest interest here. For 
we can establish some fairly simple sampling theory of sample means and sample 
sums in this case which can usefully predict what will happen in actual experi¬ 
mental sampling from an indefinitely large population. We have discussed the 
case of a finite population first because it is simpler conceptually and because 
we can make use of some results obtained in that case for the case of the indef¬ 
initely large population. The main results for the case of the finite population 
are the formulas given by (9,5) and (9,6) and the fact that the distribution of 
sample, means for large n and much larger N is approximately a normal distribu¬ 
tion, It is clear from formulas (9,5) that the mean and standard deviation of 
the distribution of means of samples of n elements from an indefinitely large 
population (i,e,, N indefinitely large) are 

(9.18) 


a 



where )i and a are the mean and standard deviation of the indefinitely large pop¬ 
ulation from which the samples are drawn. Similarly, when N is indefinitely 
large, formulas (9,6) become 


(9.19) 





You will recall from the discussion in Section 5,35 that the distri¬ 
bution for an indefinitely large population is simply a probability distribution. 
For example, the probability distribution 


may be regarded as the distribution of the number of dots obtained at a throw 
in an indefinitely large population of throws of a single "true" die. The cumu¬ 
lative probability distribution F(x) = x which is graphed in Figure 5,5 may be 
regarded as the cumulative distribution of values of pointer readings in an 


X 

1 

2 

3 

4 

5 

6 

r(x) 

l/6 

1/6 

1/6 

l/6 

1/6 

1/6 



Sec. 9,31 


9. ELEl'.fflIJTS OF SAWLim 


181 


indefinitely large population of spins of the ’’perfectly balanced” pointer 
described in Section 5.21, 

In drawing the successive elements of a sample from a finite popula¬ 
tion^ the probabilities change as more elements are withdrawn from the popula¬ 
tion, This is why we consider sampling from a finite population as a problem 
in combinations. But in drawing the successive elements of a sample from an 
indefinitely large population, the probability that a chance quantity X has a 
particular value in one drawing is not affected by the value of X in any other 
drawing. In other words, the results of successive drawings from an indefinite¬ 
ly large population are independent of each other. 

Although the formulas (9.18) and (9.19) were obtained from (9,5) and 
(9.6) merely by letting N become indefinitely large, one can actually derive 
them from scratch by another process. This process is important in the mathe¬ 
matical theory of sampling and may be illustrated by considering samples of 2 
elements from an indefinitely large population having the following general 
distribution of a discrete chance quantity X for the population. 


X 


=^2 

^3 



f(x) 

f(x^) 


f ( Xg ) 


f(xp 


The probability that the chance quantity X has the value in the first drawing 
is f(x ), and that it has the value x in the second drawing is f(x_). The 

a p p 

drawings are independent, since we are sampling from an indefinitely large popu¬ 
lation, and hence the probaoility that X has the values x and x in the two 

a p 

successive drawings is, by the law of multiplication of probabilities, f(x )*f(xo) 

0. p 

The mean of this sample of 2 elements is (xp< + )/2 . The mean value 

of the sample mean is obtained by multiplying (x + ^d)/ 2 by the probabil- 

Jv a p 

ity f(x ) • i’(xn) 9.nd summing over all possible values of a and p, i.e,. 


(9.20) 


a=l 6=1’- ^ 


which may be broken into two sums as follows; 

k k k k 


(9.21) 
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H = II f(^B) = 1 - 

a=l “ p=l P 


II ’'a' ' II ^0- “ il *i‘ ^ - 

a=l p=l P ^ i^l ^ 

(it does not matter what letter we use for the subscripts), where p is the 
population mean. Hence 

_ 1 .-1 
2 ^ 2 ^ ^ ^ 




By following the same line of reasoning, we find that for samples of size n 


(9.22) 


“"x = n " n ^ (n terms; 




In other words, the mean of the distribution of means of an indefinite- 
^ large number of samples from an indefinitely large population is equal to the 


mean of the population . 

Similarly, by applying the principle expressed by (5.3b), the variance 
of X for samples of 2 elements is given by 

(9-23) <’7^ = rz [u * X.)/21 ■ tU) • f(xj - li-t 

^ p=la=l L ^ P J ^ P ^ 

Squaring the quantity in Jand summing the resulting three terras just as we 
did in passing from (9.20) to (9,2l), we find that (9.23) reduces to 
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r V ^ I fev k^ li: 

a=l '^a=l -* '-8=1 P p-J ^ 




But we know from (5,3b) that 


x/- f(x ) • ^ X fix ) = f: X f(x ) = o 
n=1 R=1 r r 4 


2^2 
+ [1 , 


(9.25) 


9 2^2 _2 2^2 
X 4 4 4 ^ 


2 2 

But we know that p.— = p. , Hence (9.25) reduces to 


(9.26) 


2 a^' 

2 ^ 


for samples of 2 elements. 


By similar reasoning, we would find for samples of n elements that 


(9.27) 


= £l 

^ X n 


(9.28) 


This states that the standard deviation of the distribution of means of samples 
of n elements from an indefinitely large population is equal to the standard 
deTiation of the population divided by /nT 

It should be emphasized that formulas (9,22) and (9.28) are essentially 
the mean and standard deviation respectively of a probability distribution. 

Formulas (9,22) and (9,28) were actually derived for the case of samp¬ 
ling from a population with a discrete chance quantity X. But the same formulas 
hold for sampling from a population having a continuous probability distribution 
of a chance quantity X, In such a case, the population would have a probability 
density function f(x) and p and a would be obtained by integration using for¬ 
mulas (5,14) and (5.15a), 

It should be pointed out that the formulas for the mean and standard 
deviation of the distribution of the sample sum S(X) in indefinitely many samp¬ 
les from an indefinitely large population are 
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The validity of these formulas can be seen by letting N increase indefinitely 
in formulas (9,6), 


9,32 Approximate normality of distribution of sample mean in large samples from 
an indefinitely large population . 

As in the case of theoretical, sampling from a finite population, the 
distribution of means of samples from an indefinitely large population can be 
approximated by a normal distribution under certain conditions. In fact, we may 
make the following statement: 

For large values of n the distribution of means of samples of n ele¬ 
ments from an indefinitely large population with mean p, and standard deviation 
a is approximately normal with mean p~ ( = p) and standard deviation 

X . ..r ■ ■ ^ 

the accuracy of the approximation becoming perfect in the limit as n increases 
indefinitely . 

The accuracy of approximation depends on the size of n, the degree of 
lopsidedness or skewness of the population distribution, and other things. But 
in some situations, the approximation is good for practical purposes for values 
of n as small as 10. If the population has a normal distribution with mean p 
and standard deviation a, then X is exactly normally distributed with mean p and 
standard deviation o/\fn , 

The procedure by which the normal distribution is used to approximate 
probabilities concerning X or S(X) is very similar to that discussed in Sec¬ 
tion 9,24. An example will make this clear. 

Example : Suppose a ”true" die is rolled 25 times. App^’oximately what is the 
probability that the average of the 25 numbers of dots obtained ■v'dll be less 
than 4? (This is equivalent to asking: What is the probabilitj^ tliat the total 
num-ber of dots obtained will be less than IOC?) 
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In this problem the population distribution is as follows: 




The relation betv/een Z and X for this problem is 


X - 3.5 
.342 


For X = 4, we have Z = 


= 1,46 and hence 


Pr(X < 4) = Pr(Z < 1.46) , 


Looking in Table 8.1 and interpolating, we find Pr(Z < 1,46) = .928 as the 
answer to the question. 


^=33 Remarks on the binomial distribution as a theoretical sampling d istribution 

The binomial distribution given by formula (6.3) is essentially a 
theoretical sampling distribution; it is the sampling distribution of the sample 
sum su) in samples of n elements from an indefinitely large populatio n in which 
the chance quantity X has the following distribution; 


X 

0 

1 

f(x) 

q. 

p 


The occurrence of event E corresponds to x = 1, and the occurrence of "not E" 
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The rnc5an |i of this population is seen to be 0«q + l«p = p, i.e., 

= P ^ 

and the variance of the distribution is 

2 2 2 2 

(0-p) • q + U-p) 'P = pq + qp = pq(p+q) = pq, i,e., 

2 

a = pq . 

If samples of n elements are drawn from this population, then SiX), the sum of 
the X’s in the sample is simply the total number of I’s in the sample (.i.e,, 
the number of times event E occurred in the sample of n elements or trials). 
Actually, formula (6.3) gives the probability'' that S(X) has the value x; it 
follows from formula (9.19) that the mean and variance of S(X) for the case of 
samples from the binomial population having distribution 


X 

0 

1 

f(x) 

q 

p 


are given by 




‘’S(X) “ 

respectively. These values of the mean and variance of a binomial distribution 
were established by direct, but more cumbersome argument, in Section 6.2. 

If we are interested in the mean X (the proportion of trials in which 
event E occurs), of the binomial distribution, we find that the distribution 
has the following mean and standard deviation^* 


(9.30) 


PX = P 


- - /El 

X J n 


You should notice that not only do we know the mean and variance cf 
X and S(X) in samples of n elements from the binomial population having the 
distribution mentioned above, but we also know the exact sampling distribution 
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of X and S(X). This distribution is, in fact, the binomial distribution 

s _n ^n-x 

fU) = p (l-p) 

More precisely this formula gives the probability that S(X) = x, or that X = “ , 

Exercise 9.5 . 

1. Suppose the net weight of individual packages in a population of "half-pound” 
packages has mean .51 Ibo, and standard deviation .02 lb., and that the packages 
are put up in lots of 2500 packages. What proportion of the lots can be expected 
to weigh more than 1276 pounds net? Between 1273 and 1277 pounds net weight? 

2, When a one-foot ruler and a pencil are repeatedly used to mark off ”one-foot 
lengths" suppose, in fact, that a population (considered indefinitely large) of 
lengths is generated with mean 1.001 feet, and standard deviation ,005 ft. If 

a distance of 100 ruler lengths is marked off, what is the probability that the 
distance marked off will be less than 100 ft,? Will lie between 100 and 100.2 
feet? 

3. Suppose the population of men travelling in airplanes from a large city has 
a distribution of gross weights with mean 162 lb., and standard deviation 7 lb. 
What is the approximate probability that a DC-3 load of 21 men passengers would 
have a combined gross weight of more than 3500 lbs,? 

4, Eight persons sitting around a table are provided with 10 matches each. 

Each person takes some number of matches from 1 to 10 and clenches them in his 
hand. At a given instant everyone shows how many matches he is holding. Approx¬ 
imately what is the probability that the number of matches that will turn up will 
lie between 36 and 45 inclusive? 

5„ If a hand of 13 cards is dealt from a pack of 52 playing cards, the probabil¬ 
ity (to 3 decimal places) of getting 0, 1, 2, 3, 4 aces is given by the following 
probability distribution; 


X 

0 

1 

2 

3 

4 

f(x) 

.304 

.439 i 

! 

.213 

.041 

.003 






'M > 71 ' 
i fi i I n 


1 ' 


Ui 


.liliRlliyi 


ms 
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In playing 100 hands of bridge^ approximately what is the probability of getting 
a total of less than 90 aces? More than 120 aces? 

6. Suppose pamphlets are counted out in packets of 25 by weighing them. Sup¬ 
pose the distribution of weights of individual pamphlets lias mean 1 oz,, and 
standard deviation ,05 cz, A.ny weighed pile of pamphlets is "counted’* as 25 
pamphlets only if it makes the scales read between 24,5 and 25,5 ounces. What 
is the probability that a pile of 24 pamphlets will be "counted” as 25 pamph¬ 
lets by this method? That a pile of 25 pamphlets vi.ll not be ’’counted” as 25 
pamphlets? 

7. If 60^ of the voters of a certain city (the population of voters to be 
considered indefinitely large) are in favor of a given change, what is the 
probability that a random sairiple of 100 of the voters would not show a major¬ 
ity in favor of the change? 

8. Suppose certain numbers are calculated to four decimal places. If we drop 
the fourtb decim.a] (which means we drop .OOOx, where x = 0, 1, 2, 3, 4, 5, 6, 

7, 8, or 9) and add 25 resulting 3-decimal numbers, what is the approximate 
probability that the total of the dropped numbers will exceed ,01? 

9. Suppose breaking strengths of individual pieces cf a certain tj^e of plas¬ 
tic fiber have a distribution with mean 2,5 lb,, and standard deviation ,2 lb. 

What is the probability that a bundle of 100 of these fibers would support a 
weight of 255 Ibs^,? 

9,4 The Theoretical Sampling Distribution of Sums and Differences of Sample Means . 

In statistical problems, we frequently have to deal with the difference 
between the irieans of tw'-o samples, or the sum or average (weighted or unweighted) 
of two or more sample mearis. In this section we shall discuss briefly the sam¬ 
pling theory of such sms and differences. 

9o41 Differences of sample mceans . 

The main results for differences betvreen sample means which we want to 
consider rr^ay be stated as follows; 

Suppose X is the mean of a sample of n elements dravm from a population 
h aving mean p. and va r ian -u.-, and X' is the mea.n of a sample of nd elements fi'cm 
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a population with mean and variance o' . Then the mean of the distribution 
of the difference of means X - X’ is e:iven by 


(9.31) 


x-x* " ■ ^X’ 


= - ti’ c 


This formula holds for finite or indefinitely larg;e popule.tions. The variance 
of the distribution of the difference of means is ^[iyen by 


(9.32) 



X' 




or 

(9.33) 


. N-n ^ ^ N^-n* 

X- X' n “ N-l n’ ' N’-l 


in case of finite populations, where N and N’ are the nimibers of eleraents in 
the two populations respectively , and 


(9.34) 



in case of indefinitely larp^e populations . 

As in the case of single sample means, the difference between two 
sample means is, under certain conditions, approximately normally distributed 
with mean (|j.-|j.') and variance given by (9,33) in case of finite populations 
or (9,34) in case of indefinitely large populations. The main conditions under 
which the approximation is valid are similar to those for approximate normality 
of the distribution of means of single samples, namely large n and much larger 
N, and also large n^ and much larger N’. 

Let us consider an example. 

Example s Suppose 80^ of the population (considered indefinitely large) of vot¬ 
ers in a certain city is in favor of a certain proposal. Two samples of 100 
voters each are polled. Approximately what is the probability that the differ¬ 
ence between the percentages favoring the proposal will exceed 10^? 

Here vw are drawing both samples from the sar'.e pcpulation. The 
chance quantity X has the value 1 for a voter if he favors the proposal and 0 
otherwise. Hence we have from (9,30) 



190 


9. EIJIMENTS OF SAMPLING 


Sec. 9o42 


Hence from (9,31j 


and from (9e32) 


V-7 = = -8 


a- = a- ^ / , 04 

X X' V 100 


x< ’ ° 


X- X' , = 2(,.04) 


"x- X' ' .057 . 


The question to be ansv/ered in the example may be expressed as follows: What 
is the value of Pr(|X-X'i > .Ij? Now Prt|X-X'|> ,1) = l-Pri-.l < X-X’ < .1). 


For this problem we have 


„ - >-X-X' X 


°I-X' 


X~X^ = ^ 

,057 


The values of Z for X-X’ = ,1 and -,1 are = 1.76 and “ 7 —^ 

,057 .057 

Hence, we find from Table 8.1 


•1.76 . 


Pr(-,1 < X.- X» < .1) = Pr(~l,76 < Z < 1.76) = .9212. 

The probability is approximately 1 - .9212 = .0788 that the difference between 
the percentages of voters favorable to the proposal in the two samples will dif¬ 
fer by more than 10 ?^, 

9,42 Sums of sample means . 

If we take the sum of the sample means, this sum will have a distri¬ 
bution with the following mean and variance : 


(9.35) 


^^X* X- ' ^'x ^"x- 


19 . 36 ) . 

2 2 
Note that formula for T’t is the same as that for a— — ,. 

X + X X-X 

As a matter of fact, if we take any linear combination aX + a’X’ 
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where a and a’ are any given constants, this linear combination will have the 
following mean and variance; 

(9.S7) 


(9.38) 


2 2^22 

° aX+a'X'“ ^ 


Derivations . 

We can derive the basic formulas (9,31j, (9.32), (9.35) and (9.36) 
by considering two general discrete chance quantities X and X’. These deriva¬ 
tions are given below for those who want to see how the argument proceeds. 
Similarly, derivations can be made by using continuous chance quantities X and 
X». In this case the summations involved in the derivation would be replaced 
by integrations. 

Suppose X is a discrete chance quantity having the probability dis¬ 
tribution I ' I ■ " 


X 


^2 


A 

f(x) 

fU^) 

flXj) 

... 

fUj^) 


and X’ is a discrete chance quantity having the probability distribution 


x' 

-l' 

X ^ 

2 

0 • « 

x' 

^ k' 

f'(x>) 

f'(x^') 

f'(x2') 

... 



Let the mean and standard deviation of X be p and a, and those for X' be p’ and 

o', respectively. Suppose that X and X' are independent chance quantities, i.e., 

assume that the probability of X taking on any value and X' taking on any 

value x'. is equal to the product of the two probabilities, i.e, 

P 


(9.39) 


Pr(X = X and X' = x' 

a - 


f'U'p) . 


Now consider the chance quantity L = aX + a'X', It will have a distribution of 
its own, for the probability that L has a particular value "/'is obtained by add¬ 
ing the values of all products f(x^)'f'(x'p) for which ax^+a'x'^ * I, The mean 
of the distribution of L is obtained by multiplying (ax +a'x' ) by f(,x )*f'(x'o) 

0. p 0. p 

and summing over all values of a and of p, i.e. 
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(9.40) 


y k 


But this may be expressed as two sums: 

k’ k k' k 

(9.41) ii^= ^ f(x^)*f'U'^) . 

Rearranging summation signs, 

^9.42) Vl = ^Zx^fCx^)] [Z f'U'p] - a{£x'^f.(x'p)] [Z • 

But the quantity in the first [^Jis the mean p of X, that in the second [].... 
that in the third is the mean p' of X’, that in the foxrth is 1. There¬ 

fore (9.42) reduces to 


(9.43) 


p^ = ap + aV’ 


which states that ^ X and X* are any two independent chance quantities having 
means p and p’ respectively, then the mean p^ of the distribution of the linear 
combination L = aX + a'X’ has the value ap + a’p’ . 

Now let us consider the variance of L, By the definition of a var¬ 
iance (see (5.3b)) it is given by 

k ’ k 

(9,44) ZI ^ * 

p*l a=l ^ ^ 

Squaring the quantity in the parentheses, we have 


k* k 


(9.45) ~ ^ ^ (a^x^ + 2aa‘x x' + a’^x’ fHx’ ) - 

^ p=l "" P ^ ^ 


Rearranging the summation signs, and remembering that 


Z ,JZ\ «xj = . Z X' fCx' ) 

r — i D^.i* • 
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r 


2 2 2 

x’f'(xO=c^’ , and = aj-x + a’}i’ , we find 


(9o46) (J^ = a^(a^ + p.^) + 2aa'pp’ + + p’^) - (ap + a’pM^ 


Simplifying, we find 
(9.47) 


which states that X and X’ are any two independent chance quantities, having 

2 2 2 ' 
variances g and a’ respectively, then the variance of the linear combination 

L “ aX + a'X' has the value a^a^ +• a'^a’^ , 

Now ]et us return to the sampling theory of the differences and sums 
of means. Formula (9.31) follov/s immediately from (9.43) which holds for any 
two chance quantities X and X'. If we replace X by X, X’ by X', a by 1, a’ by 

-1, then p is replaced by p ^ , p’ is replaced by p~’, aX + a'X’ becomes 

A A 

X - X' and (,9,43) reduces to (,9„3l)o In a similar manner it can be seen that 
(9.35) and (9.37) are special cases of (9.43), 

Similarly, formula (9,32) is a special case of (9.47), as one will see 
by replacing X by X and X’ by X‘ and letting a = 1, and a* = -1. Also (9.36) 
is a special case of (9.47), 

The extension of formulas (9.43) and (9.47) to the case of a linear 
combination of any (finite) number of independent chance quantities is straight¬ 
forward. For instance, in the case of three independent chance quantities X, 

X’, X*', with means and standard deviations p, a? p’, o’; p’’, a’’.; respectively, 
the mean and variance of the linear combination L = aX + a’X’ + a'’X'’ are 


and 


respectively. 


Pt = ap + a'p' + a”p’ ' 




2 2 
a o 


Exercise 9.4. 


1, A rolls a die 100 times and B rolls a die 100 times. What is the approximate 
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probability that A will get a total of at least 25 points more than B? 

2, The population of Scholastic Aptitude Test scores has mean 500 and standard 
deviation 100. Approximately what is the probability that the mean score of 

a randomly selected group of 50 students will exceed the mean score of a ran¬ 
domly selected group of 25 students by at least 10 points? 

3. In problem No. 6 of Exercise 9.3, suppose two packets of 25 booklets are 
weighed. What is the approximate probability that the difference between the 
weights will be less than l/2 oz,? 


CHAPTER 10. CONFIDENCE LIMITS OF POlPULATION PARAMETERS 


10.1 Introducto r y Remarks , 

In Chapter 9 we discussed the rudiments of sampling theory with spec¬ 
ial reference to means of samples<. In all cases we started with a specified 
population and considered the problem of finding out something about the dis¬ 
tribution of means from that population. In particular, we found formulas for 
obtaining the mean and standard deviation of such a distribution of sample means; 
we stated (without mathematical proof) that, for large samples, the distribution 
of means is almost a normal distribution; and we showed how to calculate approx¬ 
imate probabilities from a normal distribution so that the mean of a random 
sample would fall in any specified interval. Similar results were presented 
for differences between sample means. 

Thus, in sampling theory, we start from a population having a kno-Rm 
distribution and deal with the problem of calculating probabilities about sam¬ 
ples (e,g., about their means) from it. In practical statistical situations we 
start from known samples and have to deal with the problem, of making inferences 
about the population (e.g., estimating its mean) from which the sample was drawn. 
In this problem of statistical inference we make use of sampling theory as a tool 
for drawing conclusions from a sample about a population. 

In this chapter we shall consider the problem of finding confidence 
limits of population parameters on the basis of sample sums and means for various 
kinds of problems, 

10.2 Confidence Limits of p in a Binomial Distribution , 

In Section 8.2 we stated that under certain conditiori.s (large n and 
value of p not too "near" 0 or 1, particularly) the binomial distribution can 
be approximated by a normal distribution. More specifically, if X is a chance 
quantity (number of times an event E occurred in n trials) distributed according 
to the binomial distribution 


fg(x) 



X n-x 

p q 


for X = 0, 1, 2, .,,, n, then X is approximately normally distributed with mean 
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np and standard deviation 7n^ , This means that for any value of z, the pro¬ 
bability of X being less than np z ' ^npq is given approximately by the value 
of Fj^U) for that value of z in Table 8.1. For example, if z = 2,0, 

Pr(X < np + 2 Jiipq) ^ .9773 . 


Similarly 


Pr(np - 2 ynpq < X < np + 2 ^y^ipq) = .9773 - .0227 

= .9546 . 


If we knew the values of p and n, we would have specific numerical values for 
np _+ 2 ynpq and hence a specific interval such that the probability is approx¬ 
imately .9546 that X would lie in that interval. For example if n « 400, p = .2, 
q ~ ,8, then np + 2ynpq = 80 ^ 16 and the interval would be (64, 96), This 
means that, if X is a chance quantity having the binomial distribution 


f(x) = cf°(.2)^,8)^“-^ 


the probability is about ,9546 that X would fall between 64 and 96 inclusive. 

Now suppose we know that X is a chance quantity which has the binomial 
distribution 


^ -400 X 400-x 

f(x) - p q 


in which the value of p is unknown and that in an experiment (of 400 trials) we 
found X to be 280. Tiniat can we say about the value of p on the basis of this 
sample evidence? Can we make some kind of estimate of the value of p? The 
answer is that for a designated "degree of confidence" we can establish a con¬ 
fidence interval on the basis of the sample evidence, such that we cay say with 
the given degree of confidence that this interval contains the value of p. 

Suppose we choose a probability of .9546 as the designated "degree of confidence". 
More precisely we refer to .9546 as the confidence coefficient . Now we know from 
the preceding discussion that whatever value p may have, it is true that 

(10.1) Pr(400p-2/400pq < X < 400p + 2/400pq) = .9546 . 


The double inequality in the parenthesis is equivalent to the following 


Seoo 10.2 10« CONFIDENCE LIMITS OF POPUUTIOW PARAIi/IETERS' _ IS 

double inequality (as will be seen by subtracting 400p from each of the three- 
members of the inequality) 


(loa) 


’Zsj 400pq < X - 400p < 2 s/400pq , 


Dividing each member of this inequality by \/400^, we obtain the following 
double inequality 


( 10 , 2 ) 


X-400p 

- 2 . < —-- < 2 , 

sj 400pq 


Hence, probability statement (10.l) is equivalent to the following probability 
statement 

X-400p 

(10,3) Pr(-2 < ■ - - < 2) ^ .9546 . 

V400pq 

How for a given value of X in a sample of 400 cases, there is a range of values 
which p can have such that the inequality (10.2) is satisfied. Hov: can we find 
this range of values of p? Simply by setting 


(10.5) 


X-400p 


X-400p 


/400p(l-p) 


and solving for p. The two values of p that will be found are called the 95,46^ 
confidence limits of p for this problem. To find these values of p we square 
the two equations and note that in either case we get 

(X-400p)^ _ , 


400(p-p 


which reduces to 


(10.7) 


(X-400p) = 1600(p-] 


(lO.V) is a quadratic equation in p, and may be rewritten as 


(10.8) 


leieoop^ - (800 X + i600)p + x^ = o . 
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Now the solutions of a general quadratic of the form 


a p + bp + c --- 0 


(lOolO) 


-b + V/b - 4ac 


Applying this formula to the quadratic equation (10.8) and simplifying a bit we 


obtain 


( 10 . 11 ) 


X+2+v/4X+4- .OIX 


The two values of p given by (10,11) are the two 95,46% confidence limits of p. 

It should be realized that what we have done amounts to finding another 
way to express the double inequality in (10.2). That inequality is equivalent 
to the following one; 

(10 12) X 2 - n/iX -j- 4 - .OIX^ ^ ^ X+2 + \/ 4X + 4 - , 

404 404 


Hence we may rewrite (10.Ij as follows; 


(10.13) Pr 


X * Z - V4X + 4 - .OIX 


X + 2 +V 4X + 4 - .OIX 


9546 , 


Written in this form the confidence limits of p show up explicitly. 

To recapitulate, we may say this; If repeated samples of 400 cases 
are drawn from the binomial population having the distribution 

f(x) = p q 

the 95.46/C confidence limits in ilO.13) will vary from sample to sample since 
they depend on X, the quantity which varies from sample to sample. For some 
samples the confidence limits will include the value of the unknown parameter 
p between them, and for others the confidence limits will not include the value 
cf p between them. But the important point is this; in about 95,46^ of the 
sam^ples, in the long run, the confidence limits will include the urlk:nown value 
of p between them. 
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Remember that in the original question we found X = 280 in the sample 
of 400. Hence, for this particular sample the two confidence limits are obtained 
by putting X = 280 in (10,11). We find 

p - .698 ,046 , 

Hence, we say that the true value of the unknown value of p is included between 
the two values ,698 ^ ,046, We repeat; we can attach about 95,46^ confidence 
to this statement, in the sense that, if we were to repeat the procedure for many 
other samples, then for about 95,46^ of the samples the value of p would be in¬ 
cluded between the values of the confidence lim.its for those s^ples . 

We may make the foregoing discussion more concrete by summarizing in 
the form of an example. 

Exsxmple ; Suppose 400 voters are selected at ra^ndom from a large city (considered 
to have an indefinitely large population of voters) and are asked whether they 
are in favor of Candidate A as a presidential candidate. Suppose 280 voters say 
"yes". What are the 95,46^ confidence limits of the proportion of voters in the 
city who would say "y®s" if they were asked? 

In this case the proportion of voters in the city who would say "y®s" 
if asked is p and is obviously unknown. If X is the number who would say "yes" 
in a sample of 400, then X would theoretically have a binomial distribution if 
repeated samples of 400 were taken. In a single sample X = 260. The 9b,46^ 
confidence limits of p are found by the foregoing procedure to be 

.698 ^ ,046 , 

i.e,, .652 ard ,744, This means that we can state on the basis of our sample 
with "about 9bA6% confidence" that the percent of voters who would say "yes" 
would lie between 65,2 and 74,4, 

The procedure we have just discussed can be extended to the case of a 
general sample of size n (large) from, a binomial population and a general con¬ 
fidence coefficient a. In this case we would have a chance quantity X distribu¬ 
ted according to the binomial distribution 

(10.14) G^p^(l-p)^“^ , 

Then in place of (10,l) we would have 

(10.15) Pr(np - v/npq < X < np z^>/n^) = a , 
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where Talues of can be obtained from Table 8.1 for any value of a to be con¬ 
sidered. In practice^values of a from ,9 to .99 are used most widely. Table 
10,1 shows the values of a and most commonly used. 


TABLE 10.1 


Values of z 

a 


Confidence 

coefficient 

a 

z 

a 

.9000 

1.645 

.9500 

1.960 

.9546 

2.000 

.9900 

2.576 

.9973 

3.000 


In place of (10.3) we would have 


(10,16) 



/npq 


< + z 


d o 


The two confidence limits of p are given by solving the two equations 

(10.17) ^ ~ "P ^ ^ ^ 

\/np(l-p) “ 

Squaring both sides and collecting terms,we obtain the quadratic equation 
(10,18) p^(n^ + nz^) - p(2nX + nz^^) + = 0 , 


The two values of p obtained by solving this equation are the confidence limits 
of p for confidence coefficient a , 

10.21 Confidence interval chart for p. 

Figure 10.1 shows a confidence interval chart for determining graphical¬ 
ly the two solutions of (10.18) for p; the chart can be used for any given value 

_ ^ 

P '' ~ (the relative frequency of "successes” in the sample), for various values 
of n, and for a ~ ,95, In this case z - 1.96, 

Cl 

To illustrate the use of the chart, suppose a sample of 1000 interviews 




Scale of p 


Chart for Determining db% Confidence Limits of p 
in g Binomial Distribution 
for n = 10, 15, 20, 50, 50, 100, 250, 1000 

(Reprinted by permission of the authors, 

C, J, Clopper and E. S. Pearson, and the publishers, the Biometrika Office) 
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is taken at random from the voting population (considered indefinitely large) 
of a large city. Suppose 240 of the voters answer "yes” to a certain question. 
What are 95% confidence limits of the percentage of voters in this population 
who would answer ”yes” to the question if asked? Here p = 

n = 1000. Locking at the ordinates in Figure 10.1, determined by the two curves 
marked n = 1000, and for p = ,24, we find .22 and .27 as the required limits. 


10.22 Remarks on sampling from a finite binomial population . 

Suppose we are sampling from a finite binomial population , i.e., one 
in which there are only two kinds of elements: A and B. Suppose p is the 
fraction of A's and q is the fraction of B»Si Then there are Np A’s and Nq B’s 
in the population. Suppose a sample of n elements is drawn from this population, 
and let X = number of A's in the sample. Then it follows from (9.6) that X has 

a distribution which has mean np and standard deviation ^npq ^ 

large and N much larger, X is approximately nomally distributed with this mean 
and standard deviation. It follows that we have a way of constructing confidence 
limits of p in the case of a large finite population. We would proceed by re¬ 
placing npq in (10,16) by (npq) and thus obtain 



as the equations to solve for p. The resulting solutions for p would be the 
lOOa^ confidence limits. 


Exercise 10.2 . 

1. Suppose a new Roosevelt dime is tossed 10,000 times and turns up heads 5,270 
times. Construct 95^ confidence limits for p, the probability of getting a head 
with this dime, 

2, If 75 out of a random, sample of 225 telephone-subscribing residences in 
Princeton do not respond to a telephone call between 7:00 and 8:00 p.ra., on a 
particular evening, what would you construct as the 90^ confidence limits of the 
percentage of telephone-subscribing homes having someone at home during those 
hours? (Assume no answer means no one is at home, and oonsider the population 
as indefinitely large.) 
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3„ In a random sample of n voters of State K suppose 51% of the ballots cast 
are for Candidate A. Hovf large should n be in order for the smaller of the two 
99fo confidence limits of the percent for A in the population of voters of 
State K to be 50^? 

4o Suppose that at a certain time there are 3600 Princeton undergraduates, A 
random sample of 400 shows 240 in favor of a certain student proposal. 

Establish 95^ confidence limits of the percent of students in the entire under¬ 
graduate body in favor of the proposal. 

5. Construct 90% confidence limits of p, the probability that the type of thumb 
tack discussed in Section 6,3 will fall point up, from, the data presented in 
Table 6.3, 

10,3 Confidence Limits of Population Means Determined from Large Samples . 

In Section 9.32 it was stated that in large samples of n elements 
fromi an indefinitely large population having mean p and standard deviation a, 
the sample mean X is approximately normally distributed with mean p and standard 
deviation a/sTn. This means that for a given a we have 

(10,19) Pr (-Z < ^ < z ) = a . 

^ ^ a o//n 

If the population distribution were such that p and a were each expressible in 
terms of a single parameter then one could find confidence limits of 0 by 
taking the end points of the range of all possible values of 0 for which tne 
inequality in (10,19) holds. These limits would involve X and n in general, 

A simple example will make this clear. 

Example ; Suppose X is a continuous chance quantity such that all values of X 
on the interval (0, 6) are equally likely, where © is unknown, (This is equiva¬ 
lent to the simple scale problem, of Section 5,21 where the length of the scale 
is 0 rather than 1,) Suppose a sample of 20 values of X' is drawn”, and the 
sample mean X has the value 3,2. What are the 90^ confidence limits of 0? 

In this problem, the probability density function is f(x) = — , over 
the interval (0, 9), Hence 
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(10.23) 
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As a matter of fact, ^ a is unknown and n is large, we may replace a in (10,23) 
by the sample standard deYiation s, and obtain 


(10.24) 


X + z 


as approximate lOOo^ confidence limits of the population mean p. 

Example s A sample of 100 washers is taken at random from a stamping machine 
during a certain afternoon and the thicknesses of the washers are measured. The 
mean and standard deviation of the thicknesses of these 100 washers are ,1022" 
and .0021” respectively. What are (approximate) dCffo confidence limits of the 
mean of the population of washers turned out that afternoon? 

Here X = .1022, and s * ,0021, and n = 100, For 90^ confidence limits 
we see from Table 10,1 that z = 1,645, Hence the (approximate) 9C^ confidence 
limits of the population mean p are 


.1022 + (1.645) 


,1022 + .00035 . 


Thus, the probability is about ,90 that the population mean p is included be¬ 
tween .10186” and .10255”. 


10.31 Remarks about confidence limits of means of finite populations . 

Sometimes we have to consider the problem of determining confidence 
limits of the mean ^p of a finite population of N elements by using a sample of 
n elements from this population (assuming n and N large enough for the mean X 
to be approximately normally distributed). In this case we replace a by 


/ H-l 


in (10.23) and s by s 


in (10.24). 


Exercise 10,5 . 

1. Establish 90^ confidence limits of p, the mean of the population of weights 
of zinc coatings from which the sample in Table 2.1 may be considered as having 
been drawn, (The sample mean and standard deviation are given in Section 5,2.) 


2. In a sample of 270 bricks from a certain population, the mean of the transvers 
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strengths is 999.8, and the standard deYiation is 202,1. Construct 95% con¬ 
fidence limits of the mean transverse strength for the population. 

3, It is known that a chance quantity X is distributed in a population in ac¬ 
cordance with a Poisson distribution, but the constant m in the distribution is 
unknown. A sample of 200 elements from this population has a mean equal to 3,4, 
Construct 95% confidence limits of m. (Refer to Chapter 7 for mean and variance 
of a Poisson distribution.) 

4. Suppose that each person at a large national convention is supplied with a 
badge with a serial number on it. At this convention you look around through 
the corridors and record a sample of 100 numbers. You find that these 100 
numbers add up to 24,520, Set up 90% confidence limits of k, the number of 
people registered at the convention. (See Problem No, 5, Exercise 9.2 for the 
sum of integers and sum of squares of integers from 1 to k.) 

10,4 Confidence Limits of Means Determined from Small Samples . 

If n, the sample size, is small, we carmot in general use the confi¬ 
dence limits (10.23) or (10,24) for p. Under certain conditions, however, we can 
establish confidence limits for p which are accurate enough for many practical 
situations. In many problems of practical statistical importance, the indefin¬ 
itely large population from which we actually draw our sample is approximately 
normal itself. This is true of many populations of measurem.ents. For example, 
we can see from. Chapter 2 and from Figures 2,4 and 8,7 that it would probably 
not be rash to make the assumption that the population from which the zinc coat¬ 
ing weights could be assumed to have come is approximately normal. It will be 
remembered from Section 9,3 that it was stated that the theoretical sampling 
distribution of the mean X in samples of size n from an indefinitely large nor¬ 
mal population is exactly (not approximately) normal . This means that if X 
is the mean of a sample from an indefinitely large no rmal population the pro¬ 
bability expressed on the left of (10,19) and (10,22) is exactly (not approxi¬ 
mately) equal to a, no matter what value n has. If a were known, then exact 
l00a% confidence limits of p would be given by (10.23), But if the population 
is unknown, (although assumed to be normal), vre could not use these confidence 
limits because <7 would be unknown. We can get around the difficulty by sub¬ 
stituting the sample standard deviation s for the unknown population standard' 
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deviation a. This is not unreasonable, since we may regard s as an estimate of a. 
We then write 

(10.25) Pr(-t <—< t ) = a , 

“ s//H “ 

where t^ is a quantity depending on n and a. More precisely t^ depends on a 
and the number of degrees of freedom of s which replaced o in (10,19). It was 
stated in Section 3,1 that the number of degrees cf freedom of s is n-1. 

Values of t for the practically useful values of a and various numbers of de¬ 
grees of freedom are given in Table 10,2,> It will be seen that where the number 
of degrees of freedom = go , we have ~ other vrords, for indefinitely 

large samples, it makes no difference if vre replace a by s; one might expect 
this to be the case, since the standard deviation of an indefinitely large samj^le 
is the same as the standard deviation of the population from which the sample 
came, 

The mathematical argunent for the validity of sts.tement (1C,25) is 
beyond the scope of this course. The problem amounts to finding the theoretica], 
sampling distribution of the quantity 



s/vfn 


in samples from a normal population with mean p. The sampling distribution cf 
this quantity does not involve the standard deviation of the normal populatiori; 
it is called the "Student” t distribution and resembles the normal distribution 
quite a lot. In fact, as n increases indefinitely, the t distribution approaches 
a normal distribution with m.ean 0 and variance 1 as its limiting distribution. 
Returning to (10,25) we see that for a given value of a and a given 
sample, the only unknown quantity is p„ Thus if we are sam:pling fro m an inde f¬ 
initely large normal population w ith unknov/n me a n p, the conf idence limits of 
p for confidence coefficient a are giv e n by s ol ving the equations 


(10.26) 

where t is determined from Table 10.2 for n-1 degrees of freedomu 
-—— a ...— '■ ..■ '■ ^ 


limits of p are 

(10.27) 




X + 



The confidence 



TABLE 10.2 


Values of t for a = ,99. .95. .90 and 



(The entries of this table taken from Statistical Methods for Research I'Vorkers 
by permission of the author, R. A. Fisher, and the publishers, Oliver and Boyd, Edinbur 
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These confidence limits of ^ have been found to be satisfactory even 
for small samples (less than 30) from populations which depart "moderately" 
from normal populations - such as those frequently met in statistical practice 
(population of weights, lengths, scores, and other such measurements}. 

Example ; A sample of 11 rats from a "control" population had an average blood 
viscosity of 3.92 (units need not be specified) with a standard deviation of 
o61. On the basis of this sample, establish 95^ limits of p, the mean blood 
viscosity of the "control" population. 

Here we have X “ 3.92, n ~ 11, s = .61, e.nd a = .95. Looking at 
Table 10.2 for n-1 -- 10 degrees of freedom, and a = .95, we find t^ = 2.228. 
Hence the 95^ confidence limits for p are given by substituting these quanti¬ 
ties in (10.27), i.e,, 

3.92 + '2.228 -hSii 

x/IT 

or 

3.92 _+ .41 . 

We say that; the probability is about .95 that the two confidence 
limits 3,51 and 4.33 include the mean of the blood viscosity meas\.irements of 
the population of "control" rats being sampled. The main assumptions here are 
that we are drawing a sample of 11 "at random" from a population in which 
viscosity measurements are "almost" normally distributed. 

Exercise 10.4 . 

1, Ten short pieces of copper v;ire from 10 rolls of wire have the follovrlng 
breaking strengths (in lbs.)j 578, 672, 570, 568, 572, 570, 570, 572, 596, 584. 
Construct 90^ confidence limits of p, the mean of the breaking strengths for 
the population from which the sample is considered to have been draw'n. 

2, Chenlcal determinations of percent of iron in five random batches of iron 
ore froTTi a certain deposit had an average of 22.3/o and a standard deviation of 
1,8^7, Establish 95/o confidence limits of the mean percent of iron in the 
deposit, 

3o Suppose a plot of land is surveyed by 5 student surveyors and they find the 
following areas for the plot (in acres); 7,27, 7.24, 7.21, 7.28, 7.23. On the 
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basis of this information, construct 90^ confidence limits of the area of the 
plot. 


iO,5 Confidence Limits of Difference between Population Means Determined from 
Large Samples . 

In Section 9.4 we stated that if two large samples are drawn respect¬ 
ively from two much larger finite populations or two indefinitely large popula¬ 
tions, then the difference between the sample means has a theoretical distribution 
which is approximately normal. We may use this fact in getting approximate 
confidence limits of the difference (p - p’) of the two population means. 

More specifically, suppose X is the mean of a large sample of n elements 
from an indefinitely large population with m.ean p and standard deviation ct, and 
X’ is the mean of a large sample of n' elements from an indefinitely large pop¬ 
ulation with mean p’ and standard deviation a’. Then it follows from Section 
9.41 that we may write the following approxim.ation: 


1.28) Pr 




2 , 2 - 

2 _ + 21 _ 


where is the same we have mentioned before, for which the useful 
values are given in Table 10.1, Expression (10,28) may be rewr-itten as 

(10.29) Pr[(I-X')-z„y£+ ^x-X') ^ « a , 

showing explicitly that the lOOa^ confidence limits of p - p’ are 


(10.30) 


X’ + z 


2 2 

Again we state that if c and a’ are unknown (remember that we are considering 

2 2 

large values of n and nO we may substitute the sample variances s and s’ for 

2 2 

the unknown a and a’ , thus giving us the confidence limits 


(10.31) 


/ 2 ,2 

X - X' + z J— + ^ 
— 0 , ^ n n’ 


An example will perhaps clarify the situation. 

Example : A certain psychological test was given to two gro\ips (samples) of Army 
prisoners; (a) first offenders and (b) recidivists. The sample statistics were 
as follows (Betts data.) ; 
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Population 

Sample 

size 

Sample 

mean 

Sample standard 
deviation 

(a) first offenders 

(b) recidivists 

580 

786 

34.45 

28,02 

8,83 

8.81 


What are 95^ confidence limits of the difference of the means for the two popu¬ 
lations? 

Let (j. and p* he the means of the scores for the populations cf first 
offenders and of recidivists, respectively. We have n = 580, n’ = 786, X = 34.45 
X' = 28.02, s = 8.83, and s’ = 8.81. For a = .95 we see from Table 10.1 that 
~ 1.96. Hence the 95^ confidence limits of (p-p’) are found by substituting 
in (10.31): 


(34,45) - (28, 


(1.96)^ 


(8.83)^ ^ (8.81)^ 


.43 + .95 . 


Thus, the probability is about .95 that the difference between the two popula¬ 
tion means, i.e,, (p - p’) is included betvreen 5,48 and 7.38. 

If we are working with large finite populations instead of indefinitely 
large ones, then the only alteration we make in the confidence limits for (p - p’ 
is to replace by a^[~^Jand by a'^(i^^jin (10.30), where N and N’ are 
the numbers of el.ements in the two populations respectively. 


10,51 Confidence limits of the difference p-p' in two binomial populations . 

In case the two populations involved are binomial populations, where 

p is the probability of ''success” in the first population, and p' is that in 

2 2 

the second population, then p = p and p’ = p’, a = pq and a = p’q’. In this 
case the sample means X and X’ simply become the proportions of "successes” in 
the samples, and (p-p') is simply (p-p'), the difference between the proportions 
of "successes” in the population. The lOOafo confidence limits of (p-p') are 
therefore (for large values of n and n') 

(10,32) (X-X’) 



which is a special case of (10.30), Since p and p' are unknown, we replace p 
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by X and p' by X’ (which can be done only in the case of samples from binomial 
populations) and obtain lOOa^ confidence limits of p-p' as follow^s: 


(I0c33) 


(X - X’) 1 


X(l-X) ^ X' (1~X«) 


Let us illustrate this by an example, 

Kxaaiple ; Suppose a sample of 400 voters of town ii showed 230 in favor of 
candidate K and a sample of 500 voters of tovm. B showed 200 voters in favor of 
candidate K, Construct 95^ confidence limits of (p-p*) (the fraction of voters 
in tovm A who support K minus the fraction in town B who support K), 

Let us regard the population of voters in A and in B as being indef¬ 
initely large. We have the following sample values 


7.230 
^ 400 


.575 , 


For a = .95, - 1,96, Hence the 95% confidence limits of p-p’ are 


(.575-.400) + 1.96 


(,575)(,425) ^ (.400)(,600) 


,175 ^ ,065 . 

Hence the probabilit^r' is about ,96 that the fraction of voters in town A sup¬ 
porting candidate K minus the fraction of voters in town B supporting candidate 
K is included between ,110 and ,240 , 


10,52 Confidence limits of the difference of two population means in case of 
small samples . 

In experimental work, it frequently happens that we have to deal with 
two small samples (less than 30, say) which can be assumed to come from two in¬ 
definitely large populations which are ’’nearly" normal, and which have equal 
variances. Thus in an experimient, we may make measurements on a sample of in¬ 
dividuals in a "control" group, and miako; similar measu.rr3ments on a sainple of 
individuals in an "experimental" grcup. In such a situation it often happens 
that the measurements in the "experimental" group "appear" to come from a 
population which has nearly the same variance as the population from which the 
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”control” group comes but which has a different mean. 

2 2 

Jf th e two p opulation variances are equal, i.e,, cr’ ~ o , then 


f? fl 1 

v'n” ""n^ ~ ^ J n ^ ~ (10,29) may be written as 

Pr (X-X’) - z.a/~+-i-<(p - pO <(X-X') + ^ a . 

L aVnn’ vrr/ y av/nn’J 


(10,34) 


If the two populations are normal the expression on the left in (10,54) is 
exactly equal to a. Now if a is unknown, as is in fact usually the case, we 
shall proceed to replace it by an estimate of a which we can make from the two 
sample standard deviations. The estimate we use for a is 


(10,35) 


(n--l)3^ -*• (n'-l)s^^ 
2 


This does not look unreasonable. For (n-l)s~ is the sum of squares of deviations 

of X’s in the first sample from their mean, and (n’-l)s’ is a similar quantity 

for the second sample. Adding them we get sums of squares of deviations for 

both samples whioh we are assuming to come from populations with the same var- 

2 

iance. The number of degrees of freedom ^contributed” by (n-l)s is n-1, and 

2 

the number "contributed” by (n’-l)s’ is n'-l; thus the total number of degrees 
of freedom is n + n’ - 2, which is seen to be the divisor in (10.35). 

Now we can construct confidence limits for (p-p*) from two small sam¬ 
ples on the basis of the following statemient: 

If two samples of n and n’ elements respectively are drawn from two 

normal populations having the same variance, and if X and X’ are the sample 
2 2 

means, and s and s * are the sample variances, then the probabilit y is a that 
(p-p’) will be included between the two values . 

(10.36) 


. y /(ti-l)s^ + jl ^ l' 

(X-X ) nU-"n~ r 2 Wn n' ’ 


where t^ is determined from Table 10,2 for n + n’ - 2 degrees cf freedom . This 

is simply to say that (10.36) gives the lOOa^ confidence limits of (p-p*). In 

2 

(10.36) the number of degrees of freedom of the estimate vfe have used for a is 
n + n’ - 2, This means that for a given a we look up t^ in Table 10,2 under 
that value of a and for the number of degrees of freedom equal to n + n' - 2, 
Example « Two methods of determining nickel content of steel, say A and B, are 
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tried out on a certain kind of steel. Samples of four determinations are made 
by each method, with the following results (Raithel f^ata): 


Method 

Sample size 

Sample mean 

Sample variance 

A 

4 

3.285?^ 

.000033 

-1 _ 1 

4 

3.258^ 

.000092 


Construct 9b% confidence limits of the difference of the means of the populations 
associated with methods A and B respectively. 

We have n - n’ = 4, X = 3.285, X’ = 3.258, 3s^ = .000099, 

2 

3s’ - .000276, and the number of degrees of freedom (n + n’ - 2) = 6. For 
a - o95 and 6 degrees of freedom, we have from Table 10.2, t = 2,447, Hence the 
confidence limits of p - p’ are obtained by substituting in U0.36): 

.027 - 2.447 /I°2pir. 

~ V 6 V 4 4 

or 

.027 + .014 . 

Thus the probability is about ,95 that the difference (p-p’) between the m.eans 
of the two populations is included between .013^ and ,041^, 

Exercise 10,5 . 

1, an attitude test was given to two groups of soldiers: (a) a group of op¬ 
erative soldiers (who had been in the Army for a while) and (b) new selectees. 

The information on the two sam.ples of soldiers was as follows (Betts data)’: 



Sam.pl e 

Sample 

Sample 

Sample 

number 

mean 

standard deviation 

(a) 

1050 

47.65 

6.77 

(b) 

531 

46.10 

6.79 


If p is the mean of the population from v/hich (a) came and p’ that of the popu¬ 
lation from which (b) came, establish 95^ confidence limits of p-p’, 

2. The following question was asked in a poll of 148 men and 152 women in 
Trenton: ”Do you approve or disapprove of the practice of tipping by and large?" 
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The results were as follows (Crespi data): 



Sample 

Noo who answered 

Group 

size 

"yes" 

Men 

148 

89 

Women 

152 

116 


Construct 95^ confidence limits of the difference betvireen proportion of 

"yeses" among population of men in Trenton and proportion of "yeses" among 
women. 


3. In an experiment on two groups of rats; (A) "Nomal"^ and (B) "Adrenalec- 
toraized", the following blood viscosity readings were found(Nice and Fishman data) 


(-0 

(B) 

3.29 

3.45 

3»91 

4.26 

4o64 

4. 71 

3.55 

3.14 

3.67 

3.45 

4.18 

5.01 

3.74 

4.43 

4.67 

4.91 

3.03 

4.22 

4.61 

4.83 

3„84 

I 3.55 

1 


Construct 95/^ confidence lim.its of the difference between the mean of 

the blood viscosity of the population of "normal" and that of the population 
of "adrenalectomized" rats. 




CHaPTEK 11. STATISTICAL SIGNIFICiiNCE TESTS . 

11.1 A Simple Sip;nificance Test , 

The problem of making statistical signifDcance tests is very closely 
related to that of determining confidence limitso Roughly speaking, a statis¬ 
tical significance test is a probability test, based on sampling theory, to 
determine whether a sample could have "reasonably^ come from a specified popu¬ 
lation, or from a population with specified values of its parameters. Let us 
illustrate by a simple example. 

Example ; A claims he can roll a die in such a. vray that he can make aces come 
up more often than the "average person" can* He demonstrates by roll.ing a die 
600 times and turning up 120 aces. He claims that this is an average of one 
ace in five rolls instead of one in six which proves his case. How can we test 
whether 120 aces in 600 rolls is "unreasonably" large on the basis of the be¬ 
havior of a normal die? 

Our approach to this question is this; Suppose the die is true and 
that it is falling according to the "laws" of a true die. We want to find the 
probability of getting 120 aces or more with a true die. If this probability 
is not "too small" we can discredit A’s claim. If X is the number of aces in 
600 rolls of a true die, it has the binomial distribution 

600,1nX/5n 600-x 
^ 6 ^ ^ 6 '^ 

This is a binomial distribution with p = l/6, n = 600. X is approxima 
normally distributed with mean np = 100, and standard deviation \/npq = 

9,13 , If we set 

7 = X-100 
^ 9,13 

then Z is approximately normally distributed with mean 0 and variance 1, Let 
us choose some small probability level, say ,01, and find a number so that the 
probability of X exceeding that number is ,01, We find from Table 8,1 that 
the value of z for this probability level is 2,33, The value of x correspond¬ 
ing to this value of z is 100 + (.2,33 j (9,13) = 121,3 which is larger than the 



f(x) = 
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sample 120, We say that the 120 as obtained by A does not sip:;nificantly exceed 
the 100 aces expected from a true die, at the 1% probabilit y leTel , Another way 
to say it is this: We tested the hypothesis that p = l/s on the basis of the 
sample, and we did not reject the hypothesis at the 1% probability level . 

The hypothesis that p = l/6 would be referred to as the null hypothe¬ 
sis in this problem. This means that if we agree to work at the 1% probability 
level A cannot be regarded as having proved his claim on the basis of the evidence 
provided by the 600 throws, 120 aces or more in 600 rolls will occur with a 
probability greater than ,01 with an ordinary "true” die behaving like a "true" 
die. If A had obtained 150 aces, say, (or more than 121 aces), then the dif¬ 
ference between this number and the 100 aces expected from a true die would have 
been statistically significant at the 1% probability level. We would then,have to 
admit that the data available support A’s claim at the 1% probability level. 


11,2 Significance Tests by using Confidence Limits . 

We can also look at this problem from the point of view of confidence 
limits. For let p be the probability of getting an ace, and let us ask whether 
the 99^ confidence limits of p will include the value l/6 (the value of p in 
the case of a true die) between them? We do not have to find the 99^ confidence 
limits of p to answer this question. All we have to do is to note from Table 
lOol that for a = ,99, = 2.576 and see vdiother p = l/6 will satisfy the 

double inequality 

-2,576 < < + 2.576 . 

\j 600p(l-p) 

For p « 1/6, the expression in the middle part of the inequality is 


120 - 100 


20 



20 

9.13 


2.19 


which clearly lies between -2.576 and + 2,576, In other words, since 2,19 lies 
between -2,576 and 2.576, 120 aces does not- differ significantly from the expect¬ 
ed number 100, and A’s claim is not supported at the 99^ confidence level. 

In using the confidence interval method, we ask whether the absolute 
value of the difference between the number of aces obtained and that expected 
is statistically significant, i.e., whether the number of aces obtained were high 
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lowo This is the reason that = 2.576 in the case of two confidence limits, 
8-nd = 2.33 in the case originally considered, in which we were asking whether 
120 is significantly larger than 100, 

In many situations a statistical significance test made on the basis 
of a sample can be easily made by simply checking whether a value of a popula¬ 
tion parameter in which we are particularly interested (which is specified in 
the hypothesis being tested) lies between confidence limits for that param.eter 
as determined from the sample. This can often be done without actually finding 
the confidence limits. 

Let us consider an example involving the difference between two means. 
Example : Suppose two machines, say A and B, are packaging ”6-ounce" cans of 
talcum powder, and that 100 cans filled by each machine are emptied and the 
contents carefully weighed. Suppose the following sample values are found: 



Sample 

Mean weight of 

Standard 

Machine 

size 

contents (oz.) 

deviations 

A 

100 

6,11 

,04 

B 

100 

6,14 

,05 


Are these means significantly different at the 99^ confidence level? 

What we are asking here is this: Could these two samples reasonably 
have come from populations having equal means? To answer this we find out 
whether the value (p-p’) = 0 is included between the 99^ confidence limits of 
(p-p')o In this case, no additional effort is required to actually find the 
99% confidence limits of (p-p’). They are 


(6.14-6,11) + 2.576 


/ 


,0016 

100 


.0025 

100 


or 


.03 + .016 . 


The confidence limits are .014 and .046, which do not include 0 between them . 
Hence we conclude that at the 99^ confidence level the difference between the 
two sample means is significantly different from 0 . In other words, we have 
tested the hypothesis ( null hypothesis) that (p-p’) = 0 and we reject it at 
the 99^ confidence level. This means that we can be practically certain on the 
basis of the two samples that machine B is putting more powder into the boxes 
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on the average than machine A. 


11.3 Significance Tests without the use of Population Parameters . 

In making some significance tests one often has first to set up a 
null hypothesis and then to determine the probability distribution or theoreti¬ 
cal sampling distribution under the null hypothesis. In such a case one does 
not have an opportunity to use confidence limits. An example will make this 
clear. 

Example ; Refer to problem. No. 5 in Exercise 5.1. Suppose the person who is 
smoking the cigarettes claim.s he can distinguish brands A, B, C, D blindfolded. 
How would we test his claim? 

An experiment is set up as described in that problem. The null hy¬ 
pothesis here is that he cannot distinguish the cigarettes and that any assign¬ 
ment of the letters A, B, C, D he makes is done ”at random", which means that 
all assignments are considered to be equally likely. If X is a chance quantity 
denoting the number of correct assignments, then under the null hypothesis the 
41 =24 different assignments provide the following probability distribution of 
X: 


X 

0 

1 

2 

3 

4 


9 

8 

6 

0 

1 


24 

24 

24 

24 


Now we consider ability to identify to be indicated or measured by large values- 

1 

of X, The largest value of X possible is 4 and its probability is ^ , which 
is about .0417. Since .0417 is greater than .01 it is clear that if we adopt a 
1% probability level we do not have an opportunity to reject the null hypothesis 
at this probability level. Not enough experimenting has been done. But suppose 
this entire experiment were repeated twice and all four brands were identified 


in both experiments. The probability of this happening under the null hypothesis 
1 2 1 

is (■rj') = . Under this condition we would reject the null hypothesis at 

0/0 

the 1% probability level and say that this person has made a significantly large 


number of correct identifications; we would conclude that he has some ability 


to discriminate among these brands of cigarettes. 
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Exercise 11 . 

1. Suppose a sack of 400 nickels of a new design is emptied on a table, and 
235 heads appear. Does this number differ significantly at the 1% probability 
level from that expected of a ’’true" coin (for which heads and tails are equally 
likely)? 

2. A poll of 400 men and of 400 women in a certain town shows 270 of the men 
and 240 of the women in favor of a certain proposal. Is there a significant 
difference between the opinion of the men and the opinion of the women on this 
proposal at the b% probability level? 

3. Suppose a random sample of 50 entering freshmen at College A has a mean 
S.A.T. score of 560 and standard deviation of 90, while a sample of 50 entering 
freslimen at College B has a mean S.A.T, score of 565 and a standard deviation 
of 95. Test, at the b% probability level, the null hypothesis that the popu¬ 
lations of entering freshmen at the two colleges have the same mean S.A.T. 
score. Assume each college is admitting 500 freslmnen, 

4o Methods A (Dichromate method) and B (Thioglycolic acid method) of determin¬ 
ing iron content of ore were tried out on each of 21 batches of iron ore. The 
difference between the percent of iron found by method A and that found by 
method B was obtained for each batch. These differences in percent were as 
follows i^Mehlig and Shepherd data) ; 


+0.05 

-0.05 

+ 0.04 

+ 0,04 

+ 0.03 

+ 0.03 

-0.05 

-0.02 

+ 0.09 

-0.10 

0.00 

+0.05 

+ 0.02 

-0.07 

+ 0,05 

+ 0,01 

-0.01 

0.00 

+ 0.09 

-0.05 

-0,01 


Is there any significant difference between the mean iron content yield as de¬ 
termined by the two methods at the b% probability level? 

5, It is known that the probability of getting no aces in a hand of bridge 
under perfect shuffling is about .3, A complains about the cards and/or 
shuffling on the basis of the fact that in 4 hands he got only one ace. Does 
he have any grounds for complaint at the 1% probability level? (Use binomial 
distribution in making significance test.) 
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6, A bag has 100 chips in it; some are white and the rest are red. A draws a 
chip and notices it is white. He returns the chip to the bag, mixes up the 
chips and draws again. He repeats this 5 times, obtaining a white chip each 
time. What is the smallest number of white chips which can be in the bag 
without making A’s 5 white chips a significantly large number of white chips 
to be drawn in 5 draws at the 1% probability level? 

7. A 5-oent plastic die was rolled 100 times and a total of 370 dots was ob¬ 
tained. Test the null hypothesis that this die is true at the 1/? probability 
level, (Hint: make use of approximate normality of S(X).) 


CHAPTER 12, TESTING RANDOIVINESS IN SAMPLES. 


12o 1 The Idea of Random Sampling; . 

The most fundamental concept underlying probability theory and sta¬ 
tistical inference is that of randomness . We have essentially taken this as 
an undefined concept throughout all of the previous chapters. We have m.erely 
implied, in a general way, that randomness means something like haphazardness 
with which values of a chance quantity X vary from trial to trial, or from 
drawing to drawing in an actual experiment or sample. In order to gain some 
idea as to whether the successive measurements in a sample are exhibiting this 
property of randomness, we must make use of the information contained in the 
order in which the measurements occur in building up the sample . In all of 
the discussion in the previous chapters we have ignored this order information 
and have considered only the fi’equency distribution of the magnitudes of the 
individual measurements in the sample. But we are now in a position to make 
use of what we know about sampling and significance tests, to make a more 
definite statement about whether the successive observations on a chance quan¬ 
tity X (i.e,, whether the successive measurements in a sample) are behaving 
in a "random” manner, 

12,2 Runs , 

Many forms of non-randomness could conceivably exhibit themselves in 
a sample of n elements. Suppose, for example, that a coin is tossed 20 times 
and let us consider three "kinds" of sequences as follows: 

Sample I; TTTTHHTTTTTTHHHHHHHH 

Sample II: HTHTHTHTHTHTHTHTHTHT 

Sample III; TTE HTHTTHTHHTTTHTHHH . 

What features of these sequences (samples), if any, are most likely to arouse 
suspicion of non-randomness? In the case of I, it is the fact that there are 
long runs (and few of them) of H’s and T’s, In the case of II, it is the fact 
that there are short runs (and many,of them) as well as regularity of H's and 
T^So In the case of III, we probably would not be suspicious. 
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Again, let us consider outside diameters of 15 shafts being turned 
out by an automatic lathe (a shaft being picked at random every half hour)> 
Consider three samples of shaft diameters and suppose they look like this 
(measurem.ents in inches) : 

TABLK 12.1 


Samples of Shaft Diameters (in inches) 


Sample A 

Sample B 

Sample C 

.501 

.502 

.505 

.502 

.501 

.501 

.502 

.500 

.502 

. 503 

.503 

.502 

.502 

.501 

o 

LO 

.503 

.502 

o502 

. 505 

.503 

.507 

.504 

.509 

.505 

.505 

.508 

.503 

. 505 

.509 

.504 

.506 

.509 

.502 

.505 

.510 

,505 

.507 

.509 

o506 

. 507 

.510 

.502 

.508 

. 510 

.501 


Presented graphically, these three samples are pictured in Figure 12.1 (the 
a's and b’s are explained later). What features of these sequences are most 
likely to arouse suspicion as to whether they are "sufficiently" random? In 
the case of Sample a, the feature which makes one wonder is the way in which 
the 15 measurements in the sample seem to rise in a general way as the sam¬ 
pling progresses. In the case of Sample B, the suspicious feature is the jump 
after the seventh dramng to a generally higher level. Sample C seems to have 
a reasonable degree of randomness. 

These features can be made more objective in the following way: Re¬ 
ferring to Figure 12.1, suppose we draw a horizontal line ( median line ) so 
that there are as many points above the line as below it. Then for every point 
below the median line write ^ under that point and belov/ the horizontal axis; 
for every point above the median line write ^ in a similar way, (See Figure 
12.1.) For samples A, B and C we have the following rows of a's and b*s (14 
in each sample); 
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Sample A 
Sample B 
Sample C 


bbbbbbbaaaaaaa 

bbbbbbbaaaaaaa 

abbbabaaabaabb 


In other words, we have reduced the three sam.ples of diameter readings to three 
samples of a’s and b's. Samples A and B, which looked suspiciously non-randoms 
when viewed graphically, also look suspiciously non-random when viewed in terms 
of a's and b's. 


SAMPLE A 


^ -SOS — Median Line 

"53 -*_ 

E . » t : 

.S - 

<=1.500«-!—=—a—r- 


. ..-. 1 Observation 

~ 2 3 4 5 6 7“T'TnT77"iriTlTl5 Number 
bbbbbbb aaaaaaa 


SAMPLE B 


Median Line 
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In trying to draw a median line for samples of 10 to 50 one may find 
that there is no such line, and that the line coming nearest to dividing the points 
into two sets having equal numbers of points Mil pass through several points 
(usually two or three in practice). In this case one may assign one or more of 
these points on the line to the side having the smaller number of points in order 
to make the number of points above the median line equal to the number below. The 
assignment of each point should be made so as to increase the nimiber of runs. 

So far, we have just used our common sense or intuition as to whether 
the observations in these various samples appear to be random. In this intuitive 
analysis we have associated non-randomness with a few very long runs (as in 
Sample I of the coin-tossing example and Samples A and B of the shaft-turning 
example), or many very short runs (as in Sample II of the coin-tossing example). 
Now, it is a matter of experience that the case of many very short runs is not 
very important in the usual statistical applications. "Naturally occurring" 
causes or factors which produce many very short runs do not occur very often. 

But there are m.any factors in statistical situations which may cause few very 
long runs . For example, in Sample I, poor tossing could cause the long 
runs. In Sample A, wear of the cutting tool on the lathe could cause a gradual 
increase in diameters of shafts and easily account for the apparent non-random¬ 
ness. In example B, the cause of non-randomness could be a slip in tool setting 
at about the time that the eighth element in -Bhe sample was drawn. For practi¬ 
cal purposes, it has been found that a satisfactory indication of non-randomness 
is the total number of runs in the sample, i.e,, the total number of isolated 
"bunches" of H’s and T*s,(or of a's and b’s), each "bunch” containing one or 
more similar letters. If we call the total number of runs U, then U has the 
following values in the 6 illustrative samples we have discussed: 


TABLE 12.2 


Sample 

I 

II 

III 

A 

B 

c 

U 

4 

20 

12 

2 

2 : 

8 


If we consider the case of many very short runs as not occurring often enough 
in statistical situations to be of much practical interest, we may then regard 
small values of U as a more important indication of non-randomness than large 
values of U . U is a criterion for testing randomness with respect to bunching, 
which is an order characteristic. It should be emphasized that other criteria 
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could be considered for testing randomness with respect to other characteristics. 
But how do we actually decide how small (or how large) U has to be in 
any given case to show non-randomness beyond a '’reasonable doubt”? We consider 
all possible permutations of the elements in a sample and find out the value of 
U (the total number of runs) for each permutation. Considering all of these 
permutations equally likely, we can then obtain a probability distribution of U. 
In practical situations in which graphs containing from 10 to 60 sample points 
similar to those shown in Figure 12.1 are constructed and runs of a’s and b’s are 
determined, one finds that it is sufficient to consider the case in which the 
number of a's is equal to the number of b’s. Suppose there are m a’s and m b’s. 
Then the probability distribution of U (i.e., the value of Pr(U = u)) is given by 


( 12 , 1 ) 



m 


This probability distribution is derived under the null hypothesis that the se¬ 
quence is random, which means that all permutations of a’s and b’s are considered 
equally likely. 

The argument for,formula (l2.1) involves permutation analysis and we 

shall not present it here. A similar formula, and not much more complicated, 

could be Witten dow for m a’s and m’ b’s where m and m’ are not equal, but the 

case m = m' is satisracbjry for present purposes. 

In making a statistical significance test of the value of U obtained 

in a sar.iple, we follow the usual practice and choose a probability level, say 

,01, Then for a given value of m (number of a’s or b’s) we find a critical 

value of u, say u such that the probability of N being less than or equal 

to u is at most .01. Then in a given example if U turns out to be less than 

the value of u . applicable to that example, vre say that there is a significant 
# U1. 

amount of non-randoianess In the sample drawings at the 1% probability level. 
Similarly, if we should be interested in the use of large values of 
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U as an indi.cation of non-randomness^ we vranld choose a critical value u’ 

« UJ- 

so that the probability of U being greater than or equal to u’ is at most 

,01. Table 12.3 shows values of u and u (significantly small values of 

, 01 0 ^ 

U at the V/o and 5/^ probability levels, respectively), and values of u’ . and 

• Ui- 

u' (significantly large values of U at the 1^ and 5/^ probability levels, 
respectively) for values of m running from 5 to 30, 

TABLE 12.5 

Tables of Critical Values of U 


m 

( = no. a’s 
= no. b’s) 

_j.._ _ji uuuiimil ..A.j— yiw*' iw. mta\w. 

Significantly small 
critical values of U 
^.05 ^.01 

Significantly large 
critical values of U 
^!o5 "^’.01 

5 

3 

2 

8 

9 

6 

3 

2 

10 

11 

7 

4 

3 

11 

12 

8 

5 

4 

12 

13 

9 

6 

4 

13 

15 

10 

6 

5 

15 

16 

11 

7 

6 

16 

17 

12 

8 

6 

17 

19 

13 

9 

7 

18 

20 

14 

10 

8 

19 

21 

15 

11 

9 

20 

22 

16 

11 

10 

22 

23 

17 

12 

10 

23 

25 

18 

13 

11 

24 

26 

19 

14 

12 

25 

27 

20 

15 

13 

26 

28 

21 

16 

14 

27 

29 

22 

17 

14 

28 

31 

23 

17 

15 

30 

32 

24 

18 

16 

31 

33 

25 

19 

17 i 

32 

34 

2G 

20 

18 

33 

35 

27 

21 

19 

34 

36 

28 

22 

19 

35 

38 

29 

23 

20 

36 

39 

30 

24 

21 

37 

40 


(Reproduced by co'urtesy of C, Eisenliart and Freda S, Swed) 


As an illustrative example, we may ask whether the value of U in 
Sample A (see Table 12.2} is significantly low. The value of XJ here is 2, 
and m = 7. But the critical value of U at the 1% probability level is 3. 
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Hence we state that the value of U in Sample A is significantly low at that 
probability level, indicating a significant degree of non-randomness in the 
sample. Similarly in the case of Sample B, In the case of Sample I (consider¬ 
ing H’s as a's and T's as b’s) we have U = 4, and m = 10. The critical value 
at the ifo probability level is u = 5, thus indicating that there is a sig¬ 
nificant amount of non-randomness of H’s and T’s in Sample I. In the case of 
Sample II, the number of runs is 20 and the critical number at the 1% probabil¬ 
ity level is (in = lO) u’ = 16, thus showing that non-randomness in Sample II 
was indicated by too many runs. 

2 Quality Control Charts . 

In the discussion of runs we talked about testing non-randomness with 
respect to bunching of a’s and b’s (i.e., bunching of values above the median 
line and below the median line). But in many cases we find it useful to ask 
whether the sample is behaving in a random fashion with respect to "excessively 
high" values or "excessively low" values of the measurements. For example, in 
sampling articles coming off a production line, one may be interested in break¬ 
ing strength, weight, or a critical length measurement of each article. One of 
the most immediate and foolproof indications that something is going wrong with 
the manufacturing process with respect to one or more of these important 
characteristics is the appearance of "excessively high" or "excessively low" 
values of the measurements in the sample. 

So the question that arises is; What does one mean by an "excessively 
high" or an "excessively low" value of a measurement? How can we make it 
definite and unambiguous when such a value has occurred so that one can be 
reasonably sure that something has crept into the manufacturing process to cause 
such a value? 

A practical procedure which is very widely used in industry in con¬ 
nection with mass production operations has been developed for answering these 
questions. The procedure is carried out graphically on what are called quality 
control charts . 

The simplest way to show ivhat a quality control chart is, how it is 
established for a given mass production operation, and how it is used, is to 
consider an example. 

Example ; In a certain plant rheostat knobs are mass produced by a plastic 
molding process. Each knob contains a metal insert, A certain dimension is 
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critical in the fitting of this knob into the'assembly for which the knob is 
intended. This critical dimension is affected by slight variations of the size 
of the molded-in. metal part and in the molding operation. 

In establishing a quality control chart for this critical dimension, 
a sample of 5 knobs was taken every hour from the molding machine and the crit¬ 
ical dimension measured on each of the 5 knobs in the set. The mean of this 
sample of 5 measurements was taken. This procedure was repeated for 25 succes¬ 
sive working hours and the data in Table 12,4 vrere obtained (Grant data). 


TABLE 12,4 
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In Table 12.4 it will be seen that the mean and ranf^e of the measure¬ 
ments for the critical dimension for each sample of 5 are given in the last two 

ool'amns, 

The value of X, the mean of the sample means, and the value of R, the 
mean of the sample ranges, are shown at the bottom of Table 12.4. Now suppose 

we plot each mean against the corresponding sample number in the order in which 

it was drawn. We get the 25 points as plotted in Figure 12.2. In actual prac¬ 
tice, each sample mean is plotted on the graph as soon as it is obtained, and 
straight lines connecting successive points are drawn; this is done foi" visual 
convenience in following the means from sample to sample (i.e., from hour to 
hour in this case), 



Figure 12.2 

Yie now draw the following throe horizontal lines on the graph: 

(1) The central line tlirough the moan X of all 25 sample means, i.e., 
through 140,6. 

(2) The upper control limit through the value X + 3(,193R), i.e., 
through the value 140.6 + 3(.193)(8.7) = 140.6 + 5.0 = 145.6. 

The lower control limit throua-h the value X - 3(.193R), i.e., 
through 140,6 - 3(.193)(8.7) = 140.6 - 5,0 - 135o6. 


(3) 
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These three lines are drami as solid lines to Sample No, 25 and dotted 
from there on. The graphical result of these operations is called a quality con¬ 
trol chart for means . Note that if we were to push all these 25 points horizon¬ 
tally to the left so as to pile them up against the vertical axis, we would simply 
have a dot diagram of 25 points with mean 140,6 and the two control limits 
140,6j;_5.0. What we have done in constructing the quality control chart is essen¬ 
tially this; We have taken the 25 successive sample means as a sort of temporary 
working standard of the amount of variability in the critical dimension. If the 
variation of the critical dimension were ’’purely random" from one rheostat knob to 
another, then one could consider that he had an indefinitely large population 
in which this dimension would follow some unknown probability distribution 
(probably not violently different from a normal distribution). Then means of 
samples of 5 from this population would tend to be much more nearly normally dis¬ 
tributed than individual measurements (i.e., samples of l), The larger the 
sample drawn each hour the 2 aore nearly normal will be the distribution of sample 
means if the critical dimension varies in a purely random way. In practice, 
however, it has been found that samples of 5 are satisfactory. 

If we knew the mean p— and standard deviation of the theoretical 
JL A 

sampling distribution of means of samples of 5 from the population of knobs, 
then we could conclude, from the normal probability table (Table 8,1), that 
about 99.74^ of means of 5 would lie between p— 1, IT* we do not know 

p~ and We therefore use the mean X of the 25 sample means of 5 as an 

A JL ^ ^ 

estimate of p— and the quantity ,193R as an estimate of > where R is the 
average of the 25 sample ranges. We can then state that, if the critical dimen¬ 
sion continues to vary in a "purely random" manner from knob to knob as it did 
during the period covered by the 25 samples, then about 99^ (99,74/^ to be more 


precise) of means of samples of 5 will fall between X + 5(.195R), i.e., between 

t he dotted control limits, in future sampling . 

The justification of the value .193R as an estimate of a — , on the 

A 

assumption that the critical dimension varies in a "purely random" way, is be¬ 


yond the scope of this course. We could have estimated <7--= — by pooling the 

variances of the 25 samples just as we ppoled two sample variances to obtain 
(10.35) as the estimate of a in the two-sample case. In practice, however, this 
involves a great deal of computation. Estimating a— by using .193R is quite 

A 
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satisfactory when as many as 25 samples of 5 measurements each are involved. 

The factor ,193 is used only when samples of 5 measurements are used. The 
factors for samples of sizes 3, 4, 6, 7, 8, 9, 10 are .341, ,243, .161, .140, 
,124, ,112, ,103, respectively. 

In practice we continue to take a sample of 5 knobs every hour 
beyond the 25th hour to see whether the sample means continue to fall within 
the dotted control lines 140.6 3(,193R), If a point falls outside the re~ 

gion bounded by the dotted lines, it is considered that something has gone 
Y.rong with the process and an examination of the process is made to see what 
the trouble might be. Of course, even when nothing has gone wrong, there is 
a probability of less than 1% of.a point falling outside the control limits. But 
the engineer takes this amount of risk of looking for trouble when it is not 
present, just to make sure that a cause of trouble which might exist does not 
go Yindetected, In other words, when a point falls outside he will bet more on 
process trouble than on pure chance as the cause of the point falling outside. 

If, as the sampling proceeds, hour by hour one finds the means of 
samples of 5 jumping about in a haphazard Ymy between the control limits, we say 
that the manufacturing process is under statistical control with respect to the 
critical dimension under consideration. What usually happens in practice when 
a quality control chart procedure is introduced is this: After establishing 
control limits on the basis of 25 or more samples, and after searching for and 
eliminating causes for process trouble every time a point falls outside the 
control limits, one soon finds that the variability from mean to mean becomes 
smaller than it was for the initial set of 25 samples. One can then take a new 
set of samples and establish a new central line and new upper and lower control 
limits, The control limits,in practice, will usually be closer together than 
the original ones, After a few stages of this kind one arrives at a state of 
statistical control involving control limits which are about as close together 
as one can hope to make them without revolutionary changes in the manufacturing 
process, 

If you look at Figure 12,2 you will see that the mean of each of the 
25 samples drawn is well within the control limits; this fact indicates that no 
causes of exceptional variation have crept into the manufacturing process. The 
tolerance limits specified by the engineering designer for this problem were 
140,6 5,0 (in units of ,001 in.); this means that any rheostat knob having its 

critical dimension outside these limits will be rejected. Looking at Table 12,4 
you will see that there are 19 knobs (about 14^) among the 125 in that table that 
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would be rejected upon inspection. This is a high figure for rejections. In 
good ’manufacturing practice, the percentage of rejections should not be more 
than 1/^ to 5^0 Since the present manufacturing process seems to be nicely 
under statistical control, the only way to get the variation in critical dimen¬ 
sion down to the point where the rejects would nob be more than 1^ to bfo would 
probably be to make a radical change in manufacturing the metal parts and in 
the molding process i.e., to "tighten” up on them so there would not be so 
much variation in either of them. 

Quality control charts are very widely used in industry. They provide 
a very simple and effective way to see graphically what goes on in successive 
sampling so as to be in a position to know when a drawing yields an excessively 
large or excessively small value, and to take appropriate action when it occurs. 
If one should wdrlb to examine the data in a control chart a little more closely 
he could make a run test on it. 

Control charts and runs provide simple and practical methods of check¬ 
ing randomness in samples from any indefinitely large population. This random¬ 
ness is necessary in order for sampling theory to be applicable. The usual 
practice in statistics is to assume that the randomness is good enough without 
actually checking it. Some of the serious pitfalls in statistics occur in 
sampling situations in which the requirement of randomness is not satisfactorily 
fulfilled. 

Quality control charts can be constructed for sample statistics other 
than means, e.g-, sums, ranges, etc. In fact, control charts for means and 
ranges are usually run parallel to each other on the samples and on the same 
sheet of paper, the range chart being placed directly below the mean chart. 

If one has enough information, theoretical or experimental, about a 
population from which one is sampling, it is possible to set up the control 
limits on a quality control chart completely in advance of any sampling. For 
example, suppose a person proposes to make a critical study of the performance 
of a die throvm under a given set of conditions (the die may be shaken in a 
leather cup, throivn on a card table, etc,)* Suppose the study is made by con¬ 
sidering means of the number of dots obtained in successive samples of 10 
throws. If the hypothesis of perfect performance is set up, then the central 
line and the two control limits can be established without any performance 
data from the die. In fact, the central lin e would pass through the mean 3.5 
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and the control lines would pass through 3,5^3 ~ 

Exercise 12 . 

1. Consider the first 3 columns of data in Table 2.1 as a sample of 45 indivi¬ 
dual measurements drawn in the order 1.47, 1.62, 1.57. Make a run test of 

these measurements for randomness at the 1% probability level. (Test for too 
many runs as well as too few.) 

2. You are supposed to have the original data on at least one of the following 
problems in Exercise 2.2: Nos, 8, 9, 10, 11, 12, 13, 14. Make a run test on 
your data at the 1% probability level, 

3. Do the best you can to write down what you would consider to be a random 
sequence of 31 numbers between 500 and 1000. Make a run test on this sequence 
of numbers at the 1% probability level, 

4. The number of divorces per 1000 persons in the U. S. for each of the years 
1920 to 1940 was: 1.6, 1.5, 1.4, 1.5, 1.5, 1,5, 1.5, 1.6, 1.6, 1.7, 1.6, 1.5, 
1.3, 1,3, 1.6, 1.7, 1,8, 1.9, 1,9, 1,9, 2.0. Test this sequence for randomness 
by the method of runs at the 1% probability level. Interpret your results. 

5. Make a quality control chart for the data in Example No, 7 of Exercise 2.2, 
using means of samples of 5. The first sample consists of the first row of 
numbers across the page, the second sample consists of the second row, etc, 

6. Using the probability distribution in Example No. 5 of Exercise 9,3, set up 
the central line and control limits for a quality control chart for the total 
number of aces in successive sets of 10 hands of bridge, which you would use in 
studying the thoroughness of shuffling as measured by number of aces dealt, 

7. The following means and ranges of muzzle velocities (in ft,/sec.) were ob¬ 
tained for samples of five from 25 consecutive lots of ammunition (Simon c! ata); 
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Lot 

Mean 

Range 

1 

1710 

42 

2 

1711 

40 

3 

1713 

39 

4 

1718 

26 

5 

1735 

10 

6 

1739 

25 

7 

1723 

14 

8 

1741 

15 

9 

1738 

11 

10 

1725 

31 

11 

1731 

25 

12 

1721 

19 

13 

1719 

43 

14 

1735 

39 

15 

1741 

17 

16 

1783 

51 

17 

1777 

9 

18 

1794 

15 

19 

1773 

37 

20 

1789 

54 

21 

1798 

15 

22 

1789 

29 

23 

1788 

39 

24 

1799 

30 

25 

1807 

44 


X=1751.88 

R=28.76 


Construct a quality control chart for the means in this problem. 

8. Suppose you have 16 dice of a certain kind (all look exactly alike). Con¬ 
struct the central line and control limits for a quality control chart you 
would use in studying whether the total number of dots appearing on these dice 
is behaving as one would expect for unbiased dice, assuming that all 16 dice 
are thrown successively. The following set of data shovrs the total number of 
dots vfhich appeared in 100 throws of 16 5-cent dice. Try your control chart 
(with its central and control limits) on these data: 


61 

57 

66 

48 

63 

49 

56 

57 

66 

61 

54 

62 

57 

63 

59 

56 

54 

60 

56 

62 

54 

62 

60 

50 

52 

62 

60 

47 

63 

32 

52 

53 

65 

57 

56 

61 

57 

63 

67 

47 

63 

51 

55 

53 

57 

58 

55 

54 

55 

53 

53 

67 

59 

59 

67 

47 

65 

55 

59 

53 

59 

56 

54 

65 

57 

50 

68 

53 

59 

63 

49 

51 

67 

69 

56 

61 

52 

56 

51 

63 

51 

64 

58 

52 

55 

65 

54 

71 

52 

56 

52 

54 

55 

71 

57 

58 

57 

69 

53 

45 




CMPTER 15, AMLYSIS OF PAIRS OF MMSURMEIPTS 


13.1 Introductory Comments . 

In all the preceding chapters we have dealt with elementary sta¬ 
tistical ,analysi s of samples involving only a single measurement, and v/ith 
elementary probability analysis of one chance quantity X. Many statistical 
problems arise in which the sample consists of pairs (triplets, or a higher 
number) of measurements. 

In some cases the relationship between the two mieasurements may be 
very strong and the statistical analysis required may be very" simple. For ex- 
amiple, a chemist interested in finding a quick method of determining alpha-resin 
content of hops first studies the relationship betv/een colcrim.eter readings for 
certain standard flasks containing various concentrations of alpha-resin. He 
performs an experim.ent and obtains the following pairs of measurements (Bullis 
and Alderton data): 

TABLE 13.1 


Color .Imeter 
reading 

X 

Concentration of alpha-resin 
(milligram.s per 100 cc-'^ 

Y 

8 

.12 

50 

.71 

81 

1.09 

102 

00 

140 

1.95 

181 

2.50 


The simplest way^ to analyze these data is to plot them on graph paper as six points, 
the X and Y coordinates of the six observed points being (8, .12), (50, .71), 

(81, 1.09), (102, 1.38), (140, 1.9b) and (161, 2.50), and to note that the points 
lie alm.ost on a straight line. In fact, if a straight line were fitted "by eye", 
it would, be quite accurate. The result of plotting the points and the straight 
line is shown in Figure 13,1. 

Fromi Figure 13,1 v/e can m.ake a reasonably accurate and quick estimate 
of the concentration of alpha-resin from: a given colorimeter reading for a flask 
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containing an unkncwn concentration of alpha-resin. For instance, if a standard 
flask containing an unknovm concentration of alpha-resin has a colorimeter read¬ 
ing of 125, we would estimate the alpha-resin concentration to be 1.75 mg/lOOcc. 
as you will see from Figure 13.1. A colorimeter reading is easy and quick to 
make, but a direct determination of the alpha-resin content in the unknown con¬ 
centration would be very time-consuming. 


Concentration of 
Alpha-resin 



Graph of the Data in Table 13.1 and a 
Straight Line Fitted ’*by eye*^ 


Figure 15.1 
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The line determined from the obs<‘rved points in Figure 13.1 for es¬ 
timating concentration of alpha-resin (Y) from colorimeter reading (X) may be 
referred to as a regression line of Y on X, We have used the line in Figure 
13,1 as a graphical regression line of alpha-resin concentration (Y) on col¬ 
orimeter reading (X). Because the fit is so close, this particular line can 
also be used as a graphical regression line of colorimeter reading (X) on alpha- 
resin concentration (y). We would use it in the latter way if we wished to 
estimate the colorimeter reading for a known alpha-resin concentration. In 
case the observed points do not lie closely along a straight line, there will 
be two regression lines as will be discussed in section 13.22 (see Figure 15,3). 

Many other simple t.xamples of this type could be given, in which the 
plotted points would fall almost along a straight line or som.e kind of a smooth 
curve. Satisfactory elementary statistical analysis for such examples consists 
of plotting the points and drawing the line or curve by eye, and using the re¬ 
sulting line or curve for estimating one of the values of a pair of measurements 
from, the other measurement. Fitting "by eye" can be done more effectively by 
using certain tools to aid the eye. In fitting straight lines, a piece of 
transparent celluloid or plastic with a fine black line is best. A piece of 
fine black thread streched from, hand to hand is good. The edge of a transparent 
triangle is satisfactory. In fitting curved lines, drafting curves or splines 
are effective,, 

In problems v/here the plotted points do not fall nearly along a line 
or a curve, the problem of statistical analysis of the relationship between the 
two variables is usually more com.plicated, 'We can, of course, still try to 
draw som.e kind of a line or curve through the points "'by eye" so as to fit them 
as "best" we can. The difficulty is that if we sliould try to repeat this op¬ 
eration on the sa/iie set of points, or if several people were to try to do it, 

they might get quite different results because of the rather wide scatter of the 
points. Hence some procedure for fitting the lino or curve is needed which will 
result in greater consistency fromi repetition to repetition than fitting "by eye". 
Not only do we need a procedure to fit a line or a curve objectively', but we need 
some way'- of measuring the amount of scatter of the points about the curve. The 
main purpose of this chapter is to give methods for handling these problems. The 
most widely used method for objectively fitting lines and curves is the miethod 
of least squares . We shall consider this method for fitting straight lines and 
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curves. 


Exercise 15.1 . 

1. The fo]lowing data v^ere obtained in a certain series of exrerinents on the 
re] atioi'iship betYrcen concentration of penicillin solution in units/rril , (X) and 
r‘:ean circle dianeter of the zone of inhibition in nni.(y) : 


X 

1 

2 

4 

8 

16 

32 

Y 

1.5.87 

17.78 

19.52 

21.35 

23.13 

24.77 


Graph these six points and construct a regression line "by eye", Fron this re¬ 
gression line., estimate the mean circle, diameter of zc?'.e of inhibition for 25 
units/ml. of penicillin solution. Kcw roanyunits of the penicillin solution 
vculd be required to produce a mean circle diameter of 20 mm.? 

2, Specimens of several brands of mayoniaise vrere analyzed for fat content by 
a rapid method and a method of the Association of Official Agricultural 

Chemists (a.O.A.C.), Denoting the result of the Rapid Ilethod by X and that by 
t';e A.C.A.C. method by Y, the following data veere obtained (by Kaufman): 


X 

Y 

80.5 

79.3 

30.3 

30.4 

25.2 

26.0 

77.4 

77.9 

48.1 

47.5 

35.7 

35.3 

18.6 

18.7 


C-raph these seven points and fit a regress: oiri line "by eye". If the Rapid 
Method shov/ed a fat content of oOp for a certain speoz-irer, vdnat percent content 
v\,ouid you estimate the a.C.a.C. Method to sho%v for the same specimen? 

3. The following data v/ere obtained in an experiment to study the relationship 
betw"6en the 8.rr.ourit of beta-erythroidine in mg/l, (X) in an aqueous solution 
and colorimeter reading of turbidity (Y) of the solution: 
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X 

40 

50 

60 

70 

80 

90 

Y 

69 

175 

2 72 

335 




Graph these six points and construct a regression curve "by eye". Estimate the 
concentration of beta-erythroidine in a solution if the colorimeter reading is 
370. l/yhat concentration of beta-er^'^throidine is required to give a colcrimeter 
reading of 200? 


13,2 The Method of Least Squares for Fitting Straight Lines , 

13,21 An example , 

We shall first consider the following 

Example ; In a class of 17 students in a sophomore mathematics course at Prince 
ten, the following Term Scores (.X) and Final Examination Scores (Y) were obtain 


TABLE 13.2 


Term Score 

X 

Final Examination Score 

Y 

'50 

■44 

44 

■25 

'54 

44 

46 

26 

64 

-41 

"37 

'21 

77 

60 

62 

52 

'45 i 

15 

69 

55 

44 ; 

42 

'57 

58 

63 

'37 

68 

'36 

45 

'40 

71’ ! 

58 

"50 

’56 


We are to fit a straight line to these data which can be used for iraking an es¬ 
timate of the Final Examination Score (y) for a student for whom only the Term 
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Score (X) is available. 

The first thing to do is to plot the data as 17 dots in the XY plane, 
as shown in Figure 13.2. The resulting plot is called a scatter diagram . Note 
that if the 17 dots were moved vertically dovmward to the X axis we would have 
^ dot frequency diagram of X (the Term Score), Similarly, if the dots were moved 
horizontally left to the Y axis, we would have a dot frequency diagram of Y (the 
Final Examination Score), The distribution of the X scores has its own mean and 
variance. Similarly for the distribution of Y scores. 


Final Examination Score 
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Usually, some useful and interesting information can be obtained by 
inspection from a scatter diagram. For exai.iple, suppose 60 is the passing 
score on the Term Score and 50 is the passing score on the Final Examination 
Ey dravifing a vertical line through X = 60 and a horizontal line through Y = 50, 
vre cut the plotted points into four sets, falling into four quadrants. The 
four points in the upper right quadrant correspond to 4 students who passed on 
both Term and Examination. The 9 points falling into the lovoer left quadrant 
failed on both. One student passed on the Examination and failed on the Term, 
while 5 passed on the Term and failed on the Examination, 

Now if we want to study the relationship betw/een the Term Score and 
Examination Score, the question is this; How do we construct a line through 
the scattered points which could be used for estimating Y from, a given X? If 
we were to try to do it several times ’’by eye” the result might not be very con¬ 
sistent from trial to trial. Or if several persons were to try to fit by 
such a method the resulting line might not be x^ery consistent from person to 
person. This variability would certainly be greater for the scatter diagram 
of Figure 15,2 than for that of Figure 15.1. We need an objective way to fit 
a line. 

Now, the equation of any straight line may be vn'itten in the form 
(13.1) Y = a + bX 

where a and b are constants. Every straight line that can be drawn on Figure 
13,2 can be obtained from (13,1) by substituting some numerical values for a 
and b. An objectively fitted regression line is one of these lines. The pro¬ 
blem of objectively fitting the line (13.1) to the data therefore amounts to 
objectively determining values cf a and b. The principle we shall use to de¬ 
termine a and b is this; Use formula (13,1) for estimating the value of Y from 
the X value of each of the 17 points. Then square the difference between the 
actual value of Y and the estimated value of Y for each point and sum these 
squares. Choose values of a and b so as to minimize this sum of squares of 
differences or residuals , If v;e think of these differences as error s in esti-, 
mating values of Y from values of X, then what wc are really doirg is to choose 
a and b so as to miinimize the suim of squares of the errors. 

The values o’f a and b determined in this manner will be called the 
least squares values of a and b , 
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Novr for the dete.ils. The actual value of X for the first point 
(fir,st entry in Table 13.2) is 50. The actual value of Y is 44. The esti¬ 
mate d val ue of Y is 

a + 50 b , 

obtained by substItu.ting X = 50 in foreuila (.13.1). The squared difference be¬ 
tween the act ual and estimated values of Y is 

(44 - a - 50 b)*^ . 


Doing 

a similar 

operati 

on on all 

of th 

: 0 

poi 

nts 

in 

the scatter 

diagram 

(i.e., 

pairs 

of scores 

in Tabl 

e 13.2), w. 

"■e hav 

'0 

for 

the su 

rn of the 17 

squared 

differ- 

ences; 












(13.2; 

) s 

a-50b)^+ 

(25-a-44b 

') + ( 

44 

-a~ 

54b; 


.,. + (36-a- 

50b)^ . 



Now/ w 

^e must o' 

hoose the 

value 

:S 

of 

a a; 

!id b 

so as to m 

ahe the ^ 

Aalue of 


S as small as possible. Such values are obtained by setting the following two 
derivatives equal to z€^ro and solving simultaneously for a and b : 


Carrying out the tv/o differentiations v/ith respect to a and b., we fii 


(13.4) 


•2(44-a-50b) - 2(25-a-44b) - ... - 2(36-a-50b} = 0 


2(50)(44-a-50b) - 2(44)(25-a-44b) - ... - 2(50)(36-a-50b) = 0 . 


Dividing each fraction by t^vo and perfomdng the arithmetic vo? find that the 
two equations simplify to 


(13.5) 


Let and 'b' be the Yai 
the first equation by 


-690 + 17a + 946b ='0 
40,238 + 946a + 54,05Gb = 0 . 

)s of a and b which satisfy these equations. Multiplying 
= 55,6471 and subtracting from the second we find 


2.44 
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A 

b = .8394 . 

Substituting this value for b in the first of the equations in (13.5) wc find 

a = -6.1215 . 


The tV'fo equations 

(13.5) may also be solved by the use of determ;inants 

as follow'G : 



1 17 

690 


<\ 1946 

40,238 

_(40,238) (17).-(6S0) (S46) 

^ "1 17 

1946 

9461 
54,8361 

1 (54, 836) (17)-(946)^' 

31,306 

37,296 

= .8394 

and 



1 690 

946 


A 140,238 

54,836 

._(54,836) (690)-(946) (40,238) 

^ 1 17 

946 

37,296 

1946- 

54,836 


-228,308 

37,296 

^ -6.1215 . 

Hence the equation of the least squares line for estimating Y from X is 


(13o6) 


Y = -6.1215 + .8394 X . 


This line is called the regression line of Y on X . Rounding the coefficients 
off to two decimal places (v/hich provides sufficient accuracy for the present 
problem) the graph of the line is §hown in Figure 13.2. It is an objectively 
obtained line which we would use for estimating the Y scor e ( Fina 1 Exaiiiination 
Score) for a stude nt of the class vfnose X score (Term Score) i s known. Fo r 
instance, suppose a student in the class having a Term Score of 70 is unable 
to take the Final Examination. What estimate would vje make for his Final Ex¬ 
amination Score if he had not taken the Examination? Substituting in (13.G) 
vre v/ould have 

Y = -6.12 ^ .84(70) 

= 52.68 

or, rounding off, v/e -would estimate his Final Examination Score to be :33, 
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13,22 The ri;eneral case . 

It v/ill be seen by exaniiniTx^; equations (.15.4) and (.13,5) that the 

constants and coefficients occurring in (15.5) are obtained by performing sum- 

2 

mation operations on X, Y, X , and XY. More specifically, if we let (X^, ’ 

(Xg, Y^), ..., ^17^ denote the 17 pairs of measurements in our sample, and 

if we are interested in estimating values of Y from values of X, then equation 
(15,5) may be written as 


(13.5a) 


17 17 

^ Y. + 17a + bX X. »= 0 
j=l J 3=1 J 


17 

- Z nn 
1 


1=1 3 j 


17 17 2 

aX X + bX X = 0 . 

j=l J j=l J 


Or writing those vrith ou'^ abbreviated notation introduced in Chapter 3 (and 

noting that the coefficient 17 in the first equation may be written as 
17 

Z (1) or S(l) in this example) we have 
3=1 


(13.7) 


- S(Y) + a S(l) + b S(X) = 0 
- 3(XY) + a S(X) + b S(X^} = 0 . 


The values of the coefficients may be conveniently found by setting 

up a table such as Table 13,3. Unless one has a comp\',ting machine it pays to 

use a simpler computing scheme than that involved in Table 13,3. Simplified 

computing schemes require special discussion and illustration, and will be con- 

2 

sidered in Section 13,3, The column of values of Y is given for later use. 

If we consider a sample of n pairs of measurements (X^, Y^), 

(X^, ^ li't the line Y = a + bX by the method of least squares, we 

will end up with equations of form (13.7) for determining a and b. Consequently 
we can consider equations (13.7) as general enough to hold for ary sample of n 
pairs of rneasurenents. In the general case S(l) = n . 

In fitting the line Y = a +• bX so as to be able to estimate values of 

Y fromi values of X, we refer to X as the independent variable or predicto r and 

Y as the dependent variable or predictand . 

Considering (13.7) as general equations, let us solve them for a and 









■hJ 
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(13.8) as follows: 
(13.8a) 




From (13.8a) it is evident that equations (13,7) can always be solved if s^ is 

2 ^ 
not zero; s will be zero only if the X measurements all have the same value, 

A 

in which case we would not be able to study the relationship between X and Y 
by least squares or any other method. 

It will be seen that 

SL(X-X)(Y-Y)] = S[Xy~XY-XY+XY] 

= S(XY) - XS(Y) - YS(X) + nXY = S(XY) - r,XY . 

In other words, the covariance betw^een X and Y may also be written a^s 


n-1 


S[(X-X)(Y-Y)] . 


1 2 1 2 

Just as 7 —-r-T S(X ) - -- [S(X)J is the m.ost convenient formula for 
(n-l)L n J 


computing the value of s„ (and similarly for s ), the formula —r 

A I n— 1 


[S(X)- S(Y)] 


S(XY) 


is the most convenient formula for computing the value of ccv(X,Y) 


A A 

Tie may write a in terras of b as follows 


a = Y - bX , 


and substituting this value for a and b for b in the equation Y = a + bX, the 
equation of the straight line whicVi fits the n observed points ’’best” in the 
sense of least squares when Y is used to estimate X, is 

(13,9) Y - Y = b(X-X) . 


This line is called the regression line of Y on X and is used for estimating 
the Y value of a pair of measurements if only the X value of the pair is known. 


let 

(13.10) 


If we let s^ be the variance of the Y measurem.ents in the saraple, and 


S[(X-}^(Y-Y)1 _ cov(X,Y) 

r = -7---- - -» 

(n-l)s„s s^s 

A 1 A 1 
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then vie may wite 
(15,11) 





Y - 



X . 


ITsin^; these values for a and b, the equation of the regression line of Y' on X 

may therefore be vTitten in terms of X, Y, s„, s ^ and r as follows; 

A 1 


__ ® Y «_ 

(15.12) (Y-Y) = r — (X-X) . 

h 


This is merely another way of vjriting equation (15.9), 

The quantity r as defined by (15.10) is called the correlation co ¬ 
efficient between X and Y; its properties and uses v/ill be discussed in 
Section 15,25. 

/V ^Y 

The quantity b or r — , which is the coefficient of X in the equa- 
^X 

tion of the regression line of Y on X, is called the rep;ression coefficient of 

^ - A- - ®Y - 

Y £n X. The constant term a(-- Y - bX = Y-r— X) is called the Y intercept . It is 

®X 

the value of Y at Ydiich the regression line cuts the Y axis. It will be seen 
from equation (15.12) that the regression line al^vays passes through the point 
(X,y), i.e., the point whose X-coordinate is the mean of the X measurements 
and whose Y-coordinate is the mean of the Y measurements. 

It is clear from considerations of symmetry that if we should need to 
estimate X for a pair of observations when the' value of Y is kno'wnq we would 
determine a line so that the sum of squares of differences between the actual 
values of X and estimated values of X is a minimum. This leads to the regression 
line of X on Y, 


(15.15; 


(X-XU r—U-Y) 


^X ^ Y 

Note that the regression coefficient of X on Y is r-Y- and not r— , i.e., it is 

and not , 

®Y ®X 

For any given scatter diagram of points such as that shoi.'m in Figure 15.3, 
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Y = 


690 

17 


40.59 


[(54,856) - (946)^] - 157.1176; = 11.71 

= Yq [(50,902) - (690)^] = 181.0074; = 15.45 


coy(X,y) - ^ [(40,258) _ ^ (946)(6S0)] = 115.0956 


A ^ 115.0956 
157.1176 


.8594 


115.0956 

r - ... .7.:06 . 

y(137.1176)(181,0074) 

If these values of X, Y, aud b were substituted in (13.12), we would find, as 
the equation of the regression line of Y on X, 


(15.14) 


(Y - 40.59) = ,8594(X - 55.65) , 


which is merely an alternative vmy of v.rriting (15,6). 


15o23 The variance of estimates of Y from X . 

If we were to use the regression equation (15,6) (or its alternative 
form. (15.14) for making an estimate of Y from each value of X in Table 13.3, we 
woul.d find that these estimates differed somewhat from the actual values of Y. 
For instance, the estimated value of Y corresponding to X = 50, obtained by 
putting X = 50 in (13.6), is -6,1215 + ,8394(50) = 35,85. The actual value of 
Y is 44. The difference or residual is (44-35,85) = +8.15. Similarly, we can 
find such a difference for each of the 17 pairs of measurements. Wiile there 
are 17 such residuals'., there are only 15 degrees of freedom ; this m.e5.ns that if 
we know the values of 15 of the residuals, the remaining 2 are automatically 
determined. The sum of the sqviaros of these residuals, divided by the number 
of degrees of freedom, is called the variance of the estimates of Y from X, or 
more briefly, the variance of estimate, and will be denoted by s^ . s^- is 
called the standard error of estimate . 

Graphically speaking, the residuals we are talking about are the 
dotted vertical segm;ents in Figure 13,2. Those lying above the regression line 
are positive and those below are negative. The variance of estimate s^ is 


,,ns 


se-'T^-ects divided i)y oS 

, , ,..,e eu» of tde squares i^dex of bV.e 

. therefore, regard s^ of s^, 

’^e Tbav ^ rpy^e smaller 

,oihU arorud ide regression r^- ...tier dia.ran is a rou,^y 

,td-. .e have prohde.s rh «hich ;:;;;. 

vroqueuti, ^ases staudar 

eiifpb-- Oiusier of poanbs. ^ 

te reasouahly '-elX ,,ere to drar^ a ^ 

This means than ,,on lino o° . . then 

® ■ h side of the regressio 3_,,^e xs s , 

3 ion line on each s ^ tt„e and lie det«eea th 

.ontically tet.^. d ,, t^er ^ 

.poreximately 68 /" of the Ixnes , , 3 ,,iU U® 

lalaUei ii-s , 3 , ahout 9hf" of the poxnts 

r:/.n. or e- », O- ,.o.n.r 

... p. 0 .n.i no... poi.“‘--n" .~- 

... o..o.o.e. a- ’ ,„..o>.o.ox-I 

o”- ...........»a; ^:,:::;;: .r 

^000 between -tno b 

one-hy-one, the let os consider the eenor , ‘t' '3'’' 

,,tues of Y. do do ^easureterrt ^ ^ Y. 

(13 9 ) ^ f‘nund t)V putting j 

equatxon US-^) is foond 

The estimate 0 3 - ^IX - X) • 

4 - r> Y = Y + dlX ^ 

Wo 6 ®^ Estimate of Y, 3 ^3tx- 

p between ao 

.. V. The difference b ^ 

The §£tnaj pallia ^3 "" 6 

mate is ,(y - t -^<-^1 ' ’ 

^4 " 3 . ell oairs of 

(^13.15) 


^3 " ^ ^ ^ d over all pairs of 

.15) ^^,, 3 ,,P.oe sx-ed 

2 .3 the square of t 

sy aefinitxon, ,.,,tded hy n- > 

measurement. ^ 1 f- la.-Y) - h(X .-X)) 


(13.16) 


' 9 ’ 3 

n-2 0=^ 


! T -we have 

lust the quantity xn I > ^ 

Squaring ^ ^ j. ^ ^ _ Y'r- 


Y) fV 


Y') + b^(X.. 
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as will be seen from (13,10;. Hence (13,17) may be simplified and expressed as 
follows : 


(13.18) 


n-1 1-2 <12 ■ 2 1 


By making use of (l3.ll) we may express s^ alternatively as foll( 


(IS.18a) 3 ^ = ^ ^2 ( 1 . H) . 

Either of the formulas (13,18) or (13.18a) enables us to calculate the variance 
of estimate without using all the individual differences between estimated and 
actual values of Y, 

It should be noted from (13,18) that wlien $ = 0 (or from (13,18a) that 
2 2 

when r = 0) we have s = s , which means that un,der these conditions the use 
1 E ’ 

of X is of no value in estimating values of Y. 

Returning to the example of 17 pairs of measurements, we have for the 
variance of estimate; 

s,^ = ^ (181.0074) (1~ (.7306)^ = 90.0159 
E l-b 


and for the standard error of estimate 


s„ = 9.468 . 

Hi 


13,24 Remarks on the sampling variability of regression lines . 

The regression line of Y on X, as determined by the method of least 
squares, provides us with an objeotive method for making an estimate of the 
value of Y for an additional pair of measurements in which the value of X is 
given. In practice, such a regression line is determined from a srample of ob¬ 
served pairs of measurements. If we imagine a population (either finite or 
indefinitely large) of pairs of measurements, we can imagine a regression line 
of Y on X for this population. 

It is clear that if the saraple regression coefficient b is zero, 
then the regression like reduces to the form Y-Y = 0 . This means that the 
regression line is parallel to the X axis and that we would get the same esti¬ 
mate of Y no matter what value of X we have.^ In practice, we would rarely 
find a sample regression coefficient equal to zero, although we may get values 
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close to zero sometimes. If we consider many samples from a population with a 
given regression coefficient, b, the sample regression coefficient will have 
values clustering around the value of b in the population, i.e., they will have 
their ovn theoretical sampling distribution. This theoretical sampling distri¬ 
bution is known under the assumption that the population is indefinitely large 
and is such that if we consider the pairs of measurements having a specific 

value of X, then Y is a chance quantity having a normal distribution with mean 
2 

a+bX and variance a , This is the assumption usually made when we consider 
sampling fluctuations of . In fact, if b is the value of the regression co¬ 
efficient in such a population and if la is the value of the regression coef¬ 
ficient in a sample, then the quantity 


(13.19) 



will have the Student t distribution (see Section 10.4 ) v/ith n-2 degrees of 
freedom. This means that if we choose a confidence coefficient a, we can say 
that 


(13.20) 



where the value of t^ is found from Table 10.2 for any specified number of de¬ 
grees of freedom. Expression (13,20) may be rewritten as 



which means that the following expressions are lOOa % confidence limits of the 
value of the regression coefficient b in the population: 


A t 

^ . E_a 

ym 


The assumptions under vhiich the confidence limits (13.22) are exact should be 
kept in mhnd. In practice, such assumptions are only approximately satisfied 
and (13.22) would only be considered as approximate 100 a.confidence limits. 

In the example of 17 pairs of scores, suppose v/e consider this as a 
sample of 17 drav/n at random from, a population of students tal:ing the particular 
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mathematics course involved in the example. Substituting in (13.22) the nmner- 
ical values of '1^, s^,, s^ and n which we have found and substituting t = 2,131 
(from Table 10,2 for a = .95 and 15 degrees of freedom) we find the following 95^ 
confidence limits of b : 


(9.488)(2.151) 

- /IT 


.839 + .432 . 


Similarly, we can establish confidence limits for the population value 
of the intercept a, or for any quantity such as a + bX^ , (the mean of Y in the 
population of pairs of measurements having X = X^). These confidence limits 
are a little more complex and will not be given here. 


13.25 Remarks on the correlation coefficient . 

The correlation coefficient r, defined by (l3.10), is useful in certain 
kinds of statistical problems as a simple index for expressing the degree of re¬ 
lationship between two variables X and Y when the scatter diagrarri for the sample 
consists of a swarm of points which is roughly elliptical in shape, and when we 
wish to consider neither variable as a predictor for the other as in the case 
of regression analysis. The value of r lies between -1 and +1. Positive values 
of r indicate positive relationship, the long direction of the cluster running 
from lower left to upper right, so as X increases, Y increases (see (a) in 
Figure 13,5). Negative values of r indicate a negative relationship, the long 
direction of the elliptical cluster running from upper left to lovrer right so 
that as X increases, Y decreases (sec (b) in Figure 13.5', If r = +1, then 
all points on the scatter diagrar.i v/ill fall cn a straight line with a positive 
slope. Similarly, if r = -1, the points of the scatter diagrajr fall along a 
straight line with negative slope. In either of these cases the relationship 
between X and Y will be a perfe c t linear relationsh ip. If r = 0, the scatter 
diagram is such that the tw'o regression lines are parallel to the X and Y axes 
respectively, and (considering scatter diagrams vfhich are roughly elliptical in 
shape) we consider there to be no relationship between X and Y, If the cluster 
of points is something like that in (c) of Figure 13.5, the standard deviation 
of X exceeds that of Y and r will be 0 (or nearly so). If the sce.tter diagram 
is circulai-, like, (d) in Figure 13,5, the two standard deviations will be equal 
(cr nearly so) and r will be 0 (or nearly so). If one should eliminate all 
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points in the scatter diagram for which X is less than we v/ould get a 
truncated scatter diagram (see (d) in Figure 13,5). The effect of such a 
truncation is, in general, to lower the value of r* To get some notion of the 
size of r for actual scatter diagrams showing various degrees of relationship 
between X and Y, the values of r for (f), (g), (h) and (i) of Figure 13.5 are 
,37, .56, ,72 and .99, respectively. 

If a large number of samples is considered as being drawn at random 
from a population of pairs of measurements having a correlation coefficient p, 
the values of the correlation coefficient in these samples will have a sampling 
distribution. The theoretical sampling distribution of a correlation coeffic¬ 
ient for a sample of a given size is very com,plicated. The mathematical 
formula for it is known in case the population is indefinitely large and has 
a two-variable normal or Gaussian distribution having a correlation coefficient 
p. To attempt to discuss such a population would be beyond the scope of this 
course. It is sufficient to say that such a distribution serves as a satis¬ 
factory model for describing the way in which plotted points v/ill be distributed 
in large-sample scatter diagrams which occur in plotting pairs of examination 
scores for a large number of students, heights and weights of a large number of 
men, length and breadth of leaves of a given tree, and so on. 

In spite of the com.plication of the theoretical sampling distribution 
of the correlation coefficient, diagrams have been worked out for determining 
from a sample correls-tion coefficient r confidence lim .i ts of th e correla tion 
coefficient p of the population from which the sample is draiMi. Figure 13.6 
gives such a diagram for a confidence coefficient of 0.95, 

As an example illustrating the use of Figure 13.6, suppose a sample 
of 25 pairs of measurements from, a two-variable normal population yields a 
correlation coefficient equal to .60, What can wc consider to be the 95?^ 
confidence limits in the population? The answer is found by entering the scale 
of r (sample correlation coefficient) at r = +.60, and noticing where the 
vertical line through r = +.60 cuts the two curves m.arl.ed 25. Reading the 
two cuts on the scale of p (popule.tion correlation coefficient), we find the 
95/^ confidence limits of p to be ,27 and ,79. Samples of only 25 pairs of 
measurements do not determine very close confidence limits for p. If the 
sample had r = +.60 and n = 400, the 95% confidence limits would be .53 and 
,66 —which are much closer together. 




;ram for Determining; 95% Confidence Limits of the Correlation Coefficient 
for Sample Sizes Ranffine; from 3 to 400 


(Reproduced by courtesy of the author, F. N. David, 
and the publisher, the Biometrika Office) 
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(b) Find the means, standard deviations, correlation coefficient for X 
and Y, regression coefficient of Y on X, and the standard error of 
estimate s„. 

hi 

(c) • Determine the regression line of Y on X and graph it on the scatter 

diagram. 

(d) If a new variety of wheat is found to have 1.9C^ carotene, what es-. 
timate would you m.ake for the carotene content of flour made fromx 
this variety? 

5, Tensile strength (y) in 1000 pounds per square inch a.nd hardness (X) in 
Roclowell’s B for each of 10 specimens of alum.inum die castings were found to be 
as follows (from Shewhart's data): 


X 

Y 

55.0 

29.3 

70,2 

54.9 

84.5 

56.8 

55.5 

50.1 

78.5 

54.0 

65.5 

50.8 

71.4 

55.4 

55.4 

51.5 ^ 

82.5 

52.2 

67.5 

55.4 


(a) Make a scatter diagram^, 

(b) Find the means, standard deviations, correlation coefficient for X 
and y, regression coefficient of Y on X, and the standard error of 
estimate s„. 

iii 

(c) Determine the regression line of Y on X and graph it on the scatter 
diagram. 

(d) If a die-casting is found to have a Roclcvrell hardness of 55, vdiat 
would you estimate its tensile strength to be? 

(e) Find 95^ confidence limits of the regression coefficient b of the 
population regression line. Interpret these confidence limits. 

4. Find the means, standard deviations, correlation coefficient and standard 

error of estimate s from Problem No. 2 of Exercise 15.1. Determine the re- 
E 

gression line of Y on X from these quantities and plot the regression line. 

If the Rapid Method yielded 55^ fat in a new brand of mayonnaise, what esti¬ 
mate would you make for the fat content that would be found by the A.O.A.C, 


method'? 



Sec. 15.5 


13. AMLYSIS OF PAIRS OF 1,-mSmElEMS 


261 


5. For a certain group of 200 college entrance students, suppose the regres¬ 
sion line of French achievement score (Y) on verbal aptitude score (X) is 

Y = ,72 X + 141.2 . 

If it is known that the mean and standard deviation of the French achievement 
test for this group are 512 and 98 respectively, and that the standard deviation 
of the verbal aptitude scores for the group is 100, what is the mean of the 
verbal aptitude scores and the correlation coefficient between X and Y? Write 
down the equation of the regression line of X on Y. If the French achievement 
score of a student from this group is 580, what estimate would you make for his 
verbal aptitude score? 

6 , Suppose the correlation coefficient between the verbal aptitude score and 
mathematical aptitude score of a random, sample of 50 students' from the class 
of 1952 turns out to be ,58. Assuming Figure 13.6 to be applicable to this 
problem, for all practical purposes, find approximately the 95^ confidence 
limits of the correlation coefficient between these scores for the entire class 
of students. 

7. Suppose a sample of 50 pairs of measurements from a population having ap¬ 
proximately a two-variable nonoal distribution yields a correlation coefficient 
of ,20. Is this significantly different from. 0 at the 95^o probability level? 
(Make use of Figure 13.6.) 

8 , Suppose the average height of male students at College A aiid that of their 
fathers are 68 inches and that the standard deviations of heights of sons and 
fathers are equal. Suppose the heights of sons (eldest if there .is more than 
one son at College A) and fathers have a correlation coefficient of .50. Graph 
the regression line of sons’ height on fathers' height. What can you say 
about the average height of sons of fathers of a given height above average 
height for fathers? Similarly for sons of fathers of a lieight below average? 


13.3 Sim.plified Computation of Coefficients for Regression Line . 

The com.putations involved in Table 13,3 are straightforward but 
cumbersomie. Simplified, but less direct, computational schemes can be devised 
which will lighten the arithmetic. We shall consider two schemes; the first 
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is a scheme rriaking use of a working origin which is useful for less than 40 
or 50 pairs of measurements, and the second is a fully coded scheme useful for 
more than this number of pairs of measurements. 


15,31 Computation by using a working origin . 

Let us return to the example of 17 pairs of scores. We choose a con- 
verjient working origin for the X measurement near the middle of the set of 
values of X in the sample. 50 will be satisfactory. Similarly, let us choose 
40 as a working origin for the Y measurement. Denote the deviations X-50 and 
Y-40 by X' and Y' respectively. Then 


X = X' + 50 
Y = Y’ +40 


Using the results of Section 3,41, we have 


13.23) 


X = X' + 50 

Y = Y' + 40 

2 2 2 2 

^X ^X' ^ ^Y ^Y’ 


S[(X-X)(Y-Y)] = S[(X'.-X’)(Y’-Y’)] 

i.e., cov (X,Y) = GOV (X',Y'’). Therefore, the correlation coefficient betv/een 
X and Y is eqv.ial to the correlation coefficient betyreen X* and Y', which means 
we can compute r by using the formula 


U3.24) 


cov(X', YQ 


A' 


Y'fe can evaluate S(X’):, S(,Y’)j S(X’Y'), S(X''^), S(Y’ ) by constructing Table 13.4, 

and from these we determine the values of X, Y, s^, s , r, as shown below 

X 1 

Table 13.4. 

In general, suppose wo have a sample of n pairs of measurements (X^, Y^) 

(X , Y„), ..., (X , Y ). Now let X be the working origin of X measurements, and 
di di 11 . n 0 

Y the origin of Y measurements. Then let X’ and Y’ be defined as 
0 

X' = X-X 
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X = X' + 50 = 55.65 


Y := Y’ + 40 = 40.59 


S(X>^) 


[27S6-0|X]= 137.1176 , = 11.71 


SCi'"-) = 2S0E , = Ar [2902. 

I ' Ic 


181.0074 , 3^ = s^, = 13.45 


S(X'»Y') = 1898, cov(X’,Y') 


= 115.0956 


115.0956 _ 

'(is7.1176) (181.0074' 


- .7306 . 


icririuias, v/e can Ineii proceea 
(13.11;, (13,12) and (13.18). 


CA J J_ u. U O U. X I \Jlil 

ubstitute these values in formulas such 


1 0 .52 Com putation b y usinp; a fully coded scheni e. 

The met.hod described in Section 13,31 is useful for computing the 
means, variances and the ccvariance when the numl)er of pairs of measurements 
does not exceed 40 or 50. If there is a larger number of measureruents, it is 
wort]rvh:i j.e to consider a scheme which involves grouping the observations with 
respect to X and also with respect to Y. We determine cell lengths, cell 
bouiidaries and. cell niidpointvS foi' X and for Y by following procedures similar 


;o those S' 


introduce nf3w units of measurofuent and 


.ns for both o^ariablies. We use the cell lengths of the X distribution 


and of the Y distriln^ticn as 


neasuremcnt, 


If ,X^ and Y_^ arc the arbitrary- origins for X and Y respectively, if 
the new units for t-,e X and Y iaeasu.rcjcents are h and k, and if tl’.e new variable 


for measuring X and Y are Z and Al, we .mav then 


(13.26) 


X =■■ X + hZ 


Y + kVy , 


'.te the means, variances and covariance for the X and 
Xq, Y^, k, h and the isoaiivS, variancos and covariance 
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for Z and W, 

Consider a sanple of pairs of raeasureirents (X.^, Y^) , (X^, 

(X , Y ), For any one of these pairs of measi.ireiri.ents, say (X., Y.), there is 
a pair of measurements (Z., W.) vrtiich vre can determine by substituting Y = Y 
in (13,26) and solving for Z and W, From Section 3,42 we may write dovm at 
once the following f ormule.s : 


(13.27) 


X = X+hZ, Y = Y + m 

o •* 0 


2 

"z 


S 


2 

w’ 


We must nov^ find a formula for calculating the covariance. We have 


cov(X,Y) = S[(X~X)(Y-Y)] 

= S[(hZ-.hZ)(kW-kW)j 


= hkS[(2-.Z)(Y«)J , 


since h and k are constants. 


(13.28) 


cov(X,Y) = hk oov(Z,W) . 


Therefore, since s = hs and s = ks„. , we have for the regression coefficient 

A. 4) I Kli 

and correlation coefficient 


(13.2S) 


^ _ cov(X, Y) h k cov(Z,Y/) k cov(Z,W) 
2 ",22 ",2 
^ ^ ®z 


r 


cov(X,Y) 


h k CQv(Z,'W) 

(h 


cov(z,yi[) 


Thus, the correlation coefficient between X and Y has the sarr.e value as that 
between Z and W . This is another vray of saying that the correlation coefficient 
for a scatter diagram remains the same, no matter where the origin is located, 
and no matter v/hat units are used to describe the measurem.ents. 

Formulas (13.26), (13.27), (13.28) and (13.29) are the formulas for 
the fully coded computation of the means, variances, covariance, regression 
coefficient and correlation coefficient, ' 

Let us illustrate the computations by an examiple. Table 1^5 (data 
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by Milbourn) shows measurements of initial thickness (X) and final thickness 
(Y) in .0001” units at 88 positions on a coil of sheet metal, the X measure- 
mxerit being obtained before and the Y measurement after a rolling operation. 

The first thing we do v/ith the measurements in Table 13.5 is to 
make a two-way frequency table. It is convenient to select cell lengths of 
5 for both variables, i.e., 

h = k = 5 . 

The following cell midpoints are convenient; 

for X : 553, 558, 623 

for Y : 398, 403, ..., 453 . 


TABLE 13.5 


Thickness before Rolling (X) and Thickness after Rolling (Y) 
at 88 Positions of a Coil of Sheet Metal 


X 

Y 

X 

Y 

X 

Y 

X 

Y 

577 

408 

581 

427 

610 

439 

607 

442 

568 

397 

601 

432 

604 

437 

602 ■ 

431 

568 

409 

597 

432 

605 

448 

608 

437' 

589 

406 

592 

439 

609 

435 

622 

451 

590 

413 

588 

423 

602 

442 

614 

437 

575 

412 

595 

428 

606 

435 

588 

432 

571 

398 

599 

438 

610 

439 

598 

431 

572 

410 

594 

428 

612 

450 

608 

431 

576 

409 

591 

438 

604 

431 

612 

446 

583 

419 

597 

430 

598 

437 

608 

431 

580 

417 

601 

432 

598 

432 

602 

433 

569 

402 

601 

437 

615 

437 

601 

434 

578 

415 

596 

437 

612 

450 

599 

426 

581 

412 

593 

438 

608 

432 

602 

440 

589 

415 

605 : 

438 

603 

438 

596 

426 

577 

421 

605 

438 

608 

444 

592 

423 

579 

407 

594 

442 

608 

437 

598 

429 

574 

425 

598 

430 

611 

448 

603 

422 

592 

419 

601 

439 

605 

433 

593 

427 

594 

421 

608 

444 

602 

434 

567 

404 

583 

428 

610 

438 

608 

443 

566 

406 

572 

! 409 

598 

443 

612 

438 

553 

404 


We now construct the Wo-way frequency table as shoTm bordered by the 
heavy lines in Table 13,6, tallying the frequencies with which the various 



TABLE 13 



558 563 568 573 578 583 588 593 598 603 608 613 618 623|w 
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combinations of values of X and Y fall into the various cells. In an actual 
situation, it is sufficient to put the tally marks in light pencil in the upper 
left-hand corner of each cell aud then erase them when they' are counted. Hence, 
we shall sho7; the total frequency in each cell and not the tally marks. We 
shall record only' cell midpoints - not cell boundaries. 

Looking at Table 13.6, you vrill see that rows (a), (b), (c), (d), (e), 
(f) and columns (g), (h), (i), (j), (k), (l) have been added for computational 
purposes. Row (a) is for the values (integral values) of Z, the coded variable 
for measurement X. Row (b) is the frequency distribution of values 
of Z; the frequencies are obtained by adding the frequencies in the two-'way 
frequency table by columns. The sum of the entries in row (b) is n, the number 
of pairs of measiir ements in the sample.Rov/ (c) is for the product of values of 

Z and the frequency and the sum of its entries gives S(Z). Row (d) is for the 
, 2 

product of Z and the frequency, and the sun of its entries gives the value of 
2 

S(Z ), Any entry in row (e) is obtained by multiplying the frequencies in the 
column of the two-way frequency' table corresponding to that entry by their W 
values and adding. For instance, v;e obtain the entry 15 as follows: 

+ (3)(3) + (2)(,0) + (1)(3) + (0)(2) +(-!)(!) 
=4+9+0+3+0-1=15 , 

The entries in row (f) are obtained by multiplying corresponding entries in 
row (e) and row (a). The sum of the entries in row (f) is the sum of the pro¬ 
ducts ZW for each entry in the two-way table, i.e., S(ZW). Rows (g), (h), 

(i), (p), (k), (l) give similar results for the Y and W measurerr.ents and their 
distribution. Since the total of row (f) and the total of column (l) are each 
equal to S(2R0^ "we have a convenient computational check. Applying formulas 
(13..26), (13,27), (13,28), (13.29), the computations of the means, variances, 
covariance, regression coefficient and correlation coefficient for X and Y from 
the m*aterial in Table 13,6 are as follows; 

X = 588 , Y = 423 , h = k = 5 , 

0^0 

1=^= 1.4545 , W = ^= 1.2159 
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Z 87 


X = 595.27 , Y = 429,08 
2 

1 = 7.8600 

X 


^ 7^ [870 - ^4^] = 7.8600 , s = s/tTsGOO = 14.02 


88 


1^735 - ”1^ 6.9299 , = 5j 6.9299 = 13. 

^ = w [ 


4 = ^i733 

ri o / 

cov(Z 


16 


720 - 


(128)(107) 


6.4869 


C = = gg 

5(7.8600) 


6.4869 


.88 


y(7.8600) (6.9299) 

The equation of the regression line of Y on X is 


(Y - 429.08) = .83 (X - 595.27) . 


A general table similar to the illustrative Table 13.6 could be 
constructed and discussed, but this seems unnecessary. You v/ill be able to see 
the full generality of the fully coded computational scheme from the example we 
have given. 

There are other fully coded computational schemes for calculating 
means, variances, and the covariance, but v;e shall not consider them here. 


Exercise 15.5. 


1, Each of fifteen expert riflemen fired a set of rounds in a kneeling posi¬ 
tion and a set of rounds in a standing position. Each man obtained a score X 
for his firings from the kneeling position, and a score Y for his firings 
from the standing position. The pairs of scores for the 15 men were as follows 
(from Scarborough and Wagner): 


X 

Y 

X 

Y 

91 

78 

93 

82 

93 

85 

88 

71 

91 

82 

88 

83 

89 

79 

92 

89 

90 

75 

91 

84 

91 

87 

92 

79 

95 

86 

94 

86 



94 

85 
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(a) Make a scatter diagram of these points, 

(b) Find the means, standard deviations, and correlation coefficient 
for X and Y, making use of a working origin. 

(c) Find the regression lines of Y on X and of X on Y and graph them 
on the scatter diagram. 

(,d) Find the standard error of estimate of Y from X. Also the standard 
error of estimate of X from Y, 

(e) If an expert rifleman makes a score of 96 from a kneeling position, 
what score would you estimate him to make if he fires from a stand¬ 
ing position? If such a rifleman makes a score of 70 from a stand¬ 
ing position, v/hat score would you estimate him to make from a 
kneeling position? 

2, Thirty prepared specimens of a synthetic rubber (Neoprene GN) were tested 
for abrasion loss in cc. per H.P, hour (y) and hardness in degrees Shore (X), 
The following data (from Buist and Davies) were obtained: 


X 

Y 

X 

Y 

X 

Y 

45 

372 

64 

164 

71 

219 

55 

206 

68 

113 

80 

186 

61 

175 

79 

182 

82 

155 

66 

154 

81 

32 

89 

114 

71 

136 

56 

228 

51 

341 

71 

112 

68 

196 

59 

340 

81 

55 

75 

128 

65 

283 

86 

45 

83 

97 

74 

267 

53 

221 

88 

64 

81 

215 

60 

166 

59 

249 

86 

148 


(a) Make a scatter diagram, 

(b) Find the regression line of Y on X and plot it on the scatter diagram, 

(c) Find the Standard error of estimate s and draw a line on each side of 
the regression line at a vertical distance of s^ and parallel to the 
regression line, 

(d) Find confidence limits of "b" and r, 

(e) . If a specimen of the rubber should have a hardness of 78, Y^rhat esti¬ 

mate would you make for abrasion loss? 


3, The 67 Economics Departmental students of the Princeton class of 1938 had 
the folloYring Verbal Scores (X) and Mathematical Scores (y) on the Scholastic 
Aptitude Test (S.A.T.); 
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X 

Y 

X 

Y 


Y 

345 

577 

523 

550 

585 

537 

395 

569 

556 

550 

593 

717 

563 

608 

479 

678 

417 

496 

543 

505 

629 

614 

486 

582 

402 

705 

490 

640 

604 

647 

472 

531 

730 

556 

515 

620 

691 

577 

611 

730 

523 

629 

624 

556 

468 

614 

545 

511 

523 

634 

574 

453 

505 

■ 756 

461 

357 

420 

621 

527 

621 

490 

589 

596 

614 

384 

527 

530 

672 

585 

498 

431 

524 

516 

640 

354 

569 

574 

627 

444 

499 

494 

698 

494 

543 

604 

543 1 

439 

595 

560 

466 

406 

473 

446 

543 

464 

1 589 

475 

569 

505 

511 

549 

543 

585 

679 

585 

672 

541 

705 

523 

608 ^ 

468 

653 

468 

582 

582 

556 ' 

578 

466 

629 

595 

575 

549 

603 

634 

607 

563 

439 

350 

417 

666 

490 

498 





549 

537 


(a) Make a scatter diagram for these 67 pairs of scores, 

(b) The mean Verbal Score and mean Mathem.atical Score of all College Board 
candidates taking the S.A.T., is 500 in each case. From the scatter 
diagram of the 67 pairs of scores, determine hov/ many of the Economics 
Departmental Students have scores; (i) above average on both the Verbal 
and the Mathematical parts of the S.A.T., (ii) below average on both, 
(iii) above average on the Mathematical part and below average on the 
Verbal part, and (iv) below average on the Verbal part and abo-ve aver¬ 
age on the Mathematical part, 

(c) By using one of the simplified computational schemes, calculate the 
means, standard deviations and correlation coefficient for X and Y, 

(d) Write down the regression equation of Y on X and plot it on the scatter 
diagram. 

(e) Find the standard error of estimate s„. 

b 

4, A mathematics test consisting of 5 subtests v/as given to a group of freshmen 
students at a certain military academy, A study was made of the relationship 
between the Part I Score (^) and the Total Score (y) of the test. The pairs of 
scores for the 139 students are given in the following two-w'ay frequency table, 
in which the cell midpoints are given at the left and belov^r. 
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(a) By use of the fully coded scheme, determine the means, standard 
deviations, and correlation coefficient, 

(b) Find the equation of the regression line of Y on X, 

(c) Find the standard error of estimate s_. 

n 

(d) What estimate would you make for the Total Score of a student who 
took only Part I and made a score of 14 on it? 


^^ the I'.Iet ho o f__Lea_st__ Squa res, 

In Section 13,2 we disci,.sscd the problem' of fitting a regression line 
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of the fern', Y - a + bX to a sample of n points by the method of least squares. 
While this is one of the most important cases v^-hich arises in elementary sta¬ 
tistical analysis, it is to be emphasized that the method of least squares is 
used for fitting otlier kinds of regression lines and curves. 


13.41 Fitting a line through the or igin by l e ast squares . 

There are problems in vrhich it m.ay be sufficiei-t to fit a straipjht 
line of the form Y = hX, i.e., cases in which we may safely^ put a 0 at the 
outset. The example considered in Section 13,1 is such a case. In that case 
it will be sufficient for routine purposes tc fit the very simple formula 

Y = bX, from which accurate estimates of concentration of alpha-resin can be 
made from colorim.eter readings by a simple multiplication. To fit the line 

Y = bX to the data of Table 13.1, proceed as yie did ii Section 13.2 fer 

fitting the line Y = a bX and consider tlie.sum of squares of differences be- 

2 

tween actual values of Y and estimated values cf Y, i.e., S = (.12 - 8bJ + 

2 2 ~ 

(.71 - 50b) + ... + (2.50 - 181b) . The value of b which minimizes S is 

d.S 

that for which = 0 . This gives 
db 

-2(8)(.12-0b)-2(50)(.71-50b) - ... -2(181)(2.50-181b) = 0 . 


Dividing by 2 and co].leciing terms we get 


•991.01 + 71890b = 0 


from. Yfhich 


b = .0138* 


The least squares fitted line is therefore 

Y - .0138X . 

If this line is graphed on Figure 13.1, it does not differ perceptibly from the 
line already plotted "by eye", thus iT.dicating that when there is a very close 
straight line relationship betv/een tvro variables, little^ if anything, is to be 
gained by using least squares. 


13.42 Fitting par a bolas and higher degree po l ynomials . 

There are situations in Yvhich vie may v/ish to fit a parabola of the form 

Y - a + bX + cX^ , 


(13.30) 
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a cubic of the form 


Y = a + bX + cX^ + dZ^ , 


or a polynomial of higher degree. 

The procedure here is just as before: we take the difference be- 
tweer each actual Y and estimated Y, square the difference and form S, the sum 
of the squared differences, S will involve the undetermined cor.stants a, b, 

Cj etc. The values of a, b, c, etc., which minimize S are those "which satisfy 

ds Os Os 

the simultaneous equations obtained by setting ^ = 0, ^ = 0, etc. 

These equations are called normal equations . There are as miany normal equa¬ 
tions as constants'to be determined. 

2 

To illustrate the case for 5 constants, let us fit Y = a + bX + cX 
to the following data: 


Y -.1 1.1 5.9 9.2 


For S Vie have 


S = (-.l-a) ■ + 


-b-c}^ + (5.9-a-2b-4c)^ (9.2-a-5b-9c) 


The normal equations are 


.2(-l-a)-2(l,l-a-b-c)-2(5.9-a-2b-4c)-2(9.2-a-5b-9c) = 0 
-2(l.l-a-b-c)-2(2)(5.9-a-2b-4c)-2(5)(9.2-a-5b-9c) = 0 


- 2 ( 1 , 1 . 


• 2(4)(5.9-a-2b-4c)-2(9)(9.2-a-5b-9c) =• 0 


Dividing by 2, and simplifying, we have 


-14.1 + 4a + 6b + 14c = 0 


+ 14b ■+■ 56c =• 0 


• 99,5 ■+■ 14a + 56b + 98c = 0 . 


Sclvi?"Lg these equations we get 

-a = -.055, "b = -.005, = 1.025 , 

and hence the equation of the regression curve of Y on X i: 


Y = -.055 - .005X 1,025X . 
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In general, for rt pairs of measurements '^1^-’ ^^2^ '^2^^ •••> 

(X , Y ), we have 

2 ^ 

S = V (Y.-a-bX.-cX.) , 

^ J J J 

or more briefly 

2 2 

(13.31) S = S(Y-a-bX-cX ) . 

The normal equations are 

-2'S(Y-a-bX-cX^) = 0 
-2-S[X(Y-a-bX-cX^)] = 0 
-2'S[X^(Y-a-bX-cX^)] = 0 . 

Dividing by 2 and writing the coefficients of these equations out explicitly, 
we have 

-S(Y) + aS(l) + bS(X) + cS(X^) = 0 

(13.32) -S(XY) + aS(X) + bS(X^) + gS(X^) = 0 
-S(X^Y) + aS(X^) + bS(X'^) + cS(X^) = 0 , 

which shows the general structure of the normal equations, and also indicates 
how one would set up normal equations for determining 4, 5 or any number of 
constants. In any particular example there may be various shortcuts to solving 
the normal equations. The most systematic method is successive elimination., 
but we shall not go i.ito tbd s method here, 

2 

The Job of finding the variance of estimate s^ of Y from X us;ing the 
regression equation (13,30) amounts to inserting the solutions and o' of 

the equations (13.32) into (13.31), finding the sum of the squares and divid¬ 
ing by n-3, the nuiriber of degrees of freedom. If there are not m.any pairs 
of measurements, say less than 20, the simplest procedure is to evaluate each 
difference in the sum of squares, square it and add the squares. For more 
pairs of measurements, we can express the si:m. of squares in term.s of a, b, c, 
and sums of various povrers and products of X and Y, For squaring (Y-a-uX-cX ) 

and summing over all n pairs of measurements, we have 

2 

(13.33) S(Y-a-bX-Sx^) = S(Y^) + a^S(l) + ^^S(X^) + 'S^S(X^) 
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+ 2aSs(X) + SaoS(X^) + 2b?S(X®) - 2gS(Y) - 2bS(Xy) - 25S(X^Y). 

Evaluating all the terms on the right-hand side of (15.55) from the data, in¬ 
serting their values and dividing the result 'by n-5, gives the variance of 
estimate. 


15,45 Fitting exponential functions . 

In grovrth and other types of problems, we are often faced with the 
problem of fitting a curve of the form 

(15.54) U = Ae‘'"^ 

to n pairs of measurements (U , X ), (U , X ), (U , X ) i7here A and b are 

-L JL ^ n n 

constants and e = 2.71828...., the base of natural logarithms. 

If we tale natural logarithms (i.e., log,^) ot both sides of (15.54) 

we have 


(15.55) log^U = log^A + bX . 

Putting “-X, and log^A = a, equation (15,55) may be v/ritten as 

Y = a + bX , 


which can be fitted to the measurements (Y^, X^), (X^, X^), ..., (X^, Y^) 

[i.e., to the pairs of measurements (X., log U ), (X_, leg U ), ..., (X , log U 
according to the procedures of Section 15,22, 7/lien ^ and ^ are found, we then 

A A/\A^ AA , 

find A from the equation .''og A - a, or A = © ', and insert a aaid b in (15.54), 
thereby obtaining the fitted curve. If the fitted form of (15.54) is plotted 
on semd-lcg graph paper, with U measured on the log scale, the resulting graph 
will be a straight line. 

Sometimes, we have to fit an equation of the form 
(13.36) V = All , 


where and b are constants, to n pairs of points "^1^^ ^^2'’ "^2^^ 

(U^, taking logarithms (to the base 10, say) of both sides of (15.56), 

we have 

log V = log A + b log U , 
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In this case we put log V = Y, log A = a, log U = X, and the equation becoiues 

Y = a + bX 

to be fitted by least squares to the n pairs of points (X^, ***^ 

(X , Y ) [i.e., to the pairs of measurements (log log V^), (log log V^), 

...j (log U j log procedures of Section 13,22. 1/Yhen a ana b are 

deteimined, the fitted form of (13,36) is 


b 

V = A U 


where A = 10 . 


If the fitted form of (13.36) is plotted on log-log graph paper, the 
resulting graph is a straight line. 


13,44 Multip le linee.r regre ss ion . 

The ideas discussed in the foregoing paragraphs can be entorided to 
situations in which we may vMsh to estimate a measurement from several other 
correlated m.easurements rather than one. For example, suppose v/e have n 
triples of measurements (X^, Y^, Z^), ^2^^ ***^ ^n^ ^ 

that we ?n.sh to find a ref-ression function of X and Y for es bimati.ng Z. The 
simplest type of a regression function is a linear one of the form 


(13.37) Z = a + bX + cY . 

The constants a, b, c can be determined by least squares, just as a, b, c wore 
determined in fitting the form Y = a + bX + cX in Section 13.42. In fact, if 
we replace X^' by Y and Y by Z in Y = a + bX + cX^, ^-/e have (13.37). 7/e shall 
not carry out the details. The procedure extends to sets of 4, 5, 6 or any 
number of measurements, where one wishes to estimate one of them from, the re¬ 
maining measurements. In the case of 3 measurements, we may think of our n 
triples of measurements as n points plotted in 3 dimensions -thereby ootain- 
ing a 3-dimensional scatter diagram. TOien Vv-e fit (13.3 7) by least squares, 
we are fitting a plane to the 3-dim.ensional scatter so as to be able to do the 
best job of estimating Z for any triple of measurements, knowing X and Y. 

Such regression analysis is called miultiple linear regression aiia_i.^.'S _i 
and has m.any technical ram^ifications which are beyond the scope of this course. 
The basic principle of fitting such regression functions is simple: it is 
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based on least squares. Such analysis is used extensively in the analysis of 
psychological tests, economic analysis, etc. The main problem in fitting such 
func'^ions is to make sure that the 3-and higher dimensionr] scatter diagrams 
are not curved or tvfisted around so that they cannot be fitted by such func¬ 
tions. This problem is not \'ery serious in some of the fields referred to, 
particularly psychological tests. 


Exercise 13.4. 

1. In determining volune of red cells in defibrinated beef blocd by a certain 
method, the following results were obtained for various dilutionr of the blood 
in its ovm ser'uni, tests being made on three specimens of blood at each dilution 
(P'cLain data) : 


Percent of whole 
blood i.n mixture 

X 

Percent of v 0 1ume 
occupied by red ce]ls 

100 

40.5 

100 

44.5 

100 

46.0 

90 

35.5 

90 

40.5 

90 

41.0 

80 

32.0 

80 

36.0 

80 

36.5 

50 

^ 20.0 

50 

22.0 

50 

23.0 


Fit the line Y = bX to this- data by least squares. Graph the data and fitted 
line , ITnat is the interpretation of b? 

2 

2. Fit a regression equation of the fomi Y = a + bX + cX to the data of 
Problem 3 in Exercise 13,1, 


3. Suppose measurements 
times X^, ^ 2 ^ •••? 
as pairs of measurements 
points by least squares, 
s,^, of Y frori'L X using thi 


Yi, Yg, Y are taken frou- s poiu^lation made at 

Vie may then think of (X Y ), (X^, Y ), ..., (X , Y ) 

1 X h. n n 

. If the regression line Y = a is fi^.X'.uj. to these 
show that a = Y and th.at the variance : £ tiuj estimate 
s regression line is s , 
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Fit the linear regression equation Z = a + bX + cY to this data by least 
squares. What estimate would you make for the Final Examination score for 
a student who made 40 on Test I and 25 on Test II? 


7, The folloY'Ting measurements were made on 10 aluminum die castings* 


Hardness in 
RockTv’-ell ’ s E 

X 

Density in 
gm,/cm» 

Y 

Tensile strength 
in 1000 Ib/sq.in. 

Z 

55.0 

2,67 

29.3 

70. 2 

2. 71 

34.9 

84.3 

2.87 

36.8 

55.3 

2.63 

30,1 

78.5 

2.58 

34.0 

63.5 

2.63 

30.8 

71.4 

2.67 

35.4 

53.4 

2.67 

31.3 

• 82.5 

2.72 

32.2 

67.3 

2.61 

33.4 


Fit a regression function of the form Z = a + bX + cY to this data by leas 
squares. What tensile strength v/ould you estimate for s casting with X = 
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